You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 31 02:31:43 UTC 2015 (duration 6m 52s) (logmsgbot))
imported>Stashbot
(ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) (duration: 00m 08s))
Line 1: Line 1:
== 2015-12-31 ==
== 2022-01-23 ==
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 31 02:31:43 UTC 2015 (duration 6m 52s)
* 22:02 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) (duration: 00m 08s)
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 25s)
* 22:02 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@37937f6]: (no justification provided)
* 21:27 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided) (duration: 00m 09s)
* 21:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided)


== 2015-12-30 ==
== 2022-01-22 ==
* 15:05 andrewbogott: restarting puppet on seaborgium; openldap will restart but config should not change
* 22:38 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 14:57 andrewbogott: restarting puppet on serpens; openldap will restart but config should not change
* 22:38 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 13:14 mark: labstore1001: apt-get install irqbalance
* 14:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 08:01 jynus: setting dbstore1001 to read_only, converting ruwiki.recentchanges back to InnoDB
* 14:51 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 05:27 ejegg|away: set drupal variable wmf_common_requeue_max back to 10
* 08:35 elukey: `apt-get clean` on an-test-coord1001 to free some space
* 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Dec 30 02:32:15 UTC 2015 (duration 6m 55s)
* 08:25 elukey: remove the `--debug=true` etcd daemon arg from ml-etcd2002 (only node having it, probably a manual test done in the past) and cleaned up spammy etcd logs to free space
* 02:26 gwicke: restbase: finished full deploy of 7db8e216 (small bug fix & a security fix) to production cluster
* 01:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 50s)
* 01:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 02:17 gwicke: restbase: starting full deploy of 7db8e216 (small bug fix & a security fix) to production cluster
* 00:27 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
* 02:14 gwicke: restbase: canary deploy of 7db8e216 (small bug fix & a security fix) to restbase1001
* 00:08 andrewbogott: restarting nova-compute on labvirt1002


== 2015-12-29 ==
== 2022-01-21 ==
* 23:56 andrewbogott: restarting nodepool on labnodepool1001
* 22:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 23:00 ejegg: re-enabled fundraising banner campaigns
* 22:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 22:50 ejegg: restarted fundraising scheduled jobs
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:14 ejegg: shut down campaigns and scheduled jobs due to dead ActiveMQ
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 22:13 andrewbogott: restarting slapd on seaborgium and serpens
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 22:11 andrewbogott: disabling puppet on seaborgium and serpens
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:47 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Turn off AB test for search lang detect via accept-language (duration: 00m 29s)
* 21:38 brennen@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/VisualEditor/modules/ve-mw: Backport: [[gerrit:756066{{!}}Revert "Re-duplicate deduplicated TemplateStyles" (T287675 T299251 T299767)]] (duration: 00m 49s)
* 16:04 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable cross-wiki upload A/B test [[gerrit:261371]] (duration: 00m 31s)
* 21:21 topranks: Running homer against cr1-eqiad and cr2-eqiad to remove entries on analytics-in4/6 filters that refer to decommissioned deb mirror host sodium.
* 15:06 apergos: labs salt instances salt update in progress. It's slow and tedious and automated. A few hundred instances already done, the rest are going one at a time. Only instances that use the labcontrol salt master will be affected.
* 19:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:29 apergos: salt wm2 packages now installed on all production hosts except for: mw1041.eqiad.wmnet, technetium.eqiad.wmnet, mw1228.eqiad.wmnet, ms-be1011.eqiad.wmnet
* 19:10 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 apergos: labcontrol*, neodymium and palladium updated to latest salt packages (wm2), rest of prod to follow
* 19:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:24 logmsgbot: reedy@tin Synchronized wmf-config/CommonSettings.php: Attempt to fix math related fatal (duration: 00m 33s)
* 19:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:54 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Setting weight values for s6 to original production values (duration: 00m 35s)
* 18:46 herron: restarting pybal on lvs1015,lvs1020,lvs2009,lvs2010 to remove legacy elk5 services [[phab:T299700|T299700]]
* 09:37 jynus: changing the mysql master of db2028, from db1030 to db1050
* 18:39 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:41 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Dec 29 02:41:49 UTC 2015 (duration 6m 38s)
* 18:36 robh@cumin1001: START - Cookbook sre.dns.netbox
* 02:35 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 15m 38s)
* 18:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:02 logmsgbot: awight@tin Synchronized php-1.27.0-wmf.9/extensions/CentralNotice: Update CentralNotice: T122251 (duration: 00m 34s)
* 18:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 17:42 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.4-1_amd64.changes
* 16:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
* 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1021.eqiad.wmnet with OS buster
* 16:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS buster
* 16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
* 16:46 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1020.eqiad.wmnet with OS buster
* 16:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sodium.wikimedia.org
* 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS buster
* 16:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 16:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 16:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts sodium.wikimedia.org
* 16:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
* 16:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1019.eqiad.wmnet with OS buster
* 16:02 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom
* 16:02 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom
* 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 15:50 moritzm: added ganeti1025 to Ganeti eqiad cluster [[phab:T293909|T293909]]
* 15:29 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 15:29 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster
* 15:24 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS buster
* 15:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1018.eqiad.wmnet
* 15:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1018.eqiad.wmnet with OS buster
* 15:07 herron: removing kibana.discovery.wmnet record and switching legacy elk LVS instances to state: lvs_setup [[phab:T299700|T299700]]
* 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster
* 14:35 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 07s)
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1018.eqiad.wmnet with OS buster
* 14:35 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 13:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
* 13:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1017.eqiad.wmnet with OS buster
* 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1017.eqiad.wmnet
* 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2025.codfw.wmnet
* 13:01 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 13:01 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1016.eqiad.wmnet
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2024.codfw.wmnet
* 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1017.eqiad.wmnet with OS buster
* 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
* 12:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
* 12:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1016.eqiad.wmnet with OS buster
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
* 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
* 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1016.eqiad.wmnet with OS buster
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster
* 11:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2023.codfw.wmnet
* 11:15 vgutierrez: pool cp3063 running envoy as TLS termination layer - [[phab:T271421|T271421]]
* 11:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2023.codfw.wmnet with OS buster
* 10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS buster
* 10:33 moritzm: migrate primary/secondary instances off ganeti1013
* 10:14 moritzm: switch kubetcd1006 back to plain disks
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks
* 10:09 moritzm: switch kubetcd1005 back to plain disks
* 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2023.codfw.wmnet with OS buster
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks
* 10:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks
* 09:51 moritzm: switch kubetcd1004 back to plain disks
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks
* 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
* 09:40 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3063.esams.wmnet with OS buster
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18970 and previous config saved to /var/cache/conftool/dbconfig/20220121-093120-root.json
* 09:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18969 and previous config saved to /var/cache/conftool/dbconfig/20220121-091617-root.json
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 09:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18968 and previous config saved to /var/cache/conftool/dbconfig/20220121-090113-root.json
* 09:00 vgutierrez: depool cp3063 to be reimaged as cache::upload_envoy - [[phab:T271421|T271421]]
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18967 and previous config saved to /var/cache/conftool/dbconfig/20220121-084609-root.json
* 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18966 and previous config saved to /var/cache/conftool/dbconfig/20220121-083106-root.json
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18965 and previous config saved to /var/cache/conftool/dbconfig/20220121-081602-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18964 and previous config saved to /var/cache/conftool/dbconfig/20220121-080058-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18963 and previous config saved to /var/cache/conftool/dbconfig/20220121-075801-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18962 and previous config saved to /var/cache/conftool/dbconfig/20220121-074555-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18961 and previous config saved to /var/cache/conftool/dbconfig/20220121-074257-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18960 and previous config saved to /var/cache/conftool/dbconfig/20220121-073051-root.json
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1032.eqiad.wmnet with OS bullseye
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18959 and previous config saved to /var/cache/conftool/dbconfig/20220121-072754-root.json
* 07:26 elukey: elukey@stat1007:~$ sudo systemctl reset-failed product-analytics-movement-metrics.service
* 07:21 elukey: elukey@build2001:~$ sudo systemctl reset-failed ifup@ens13.service
* 07:19 elukey: systemctl reset-failed session-3.scope on an-test-client1001 (failed, transient unit)
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18958 and previous config saved to /var/cache/conftool/dbconfig/20220121-071250-root.json
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1032.eqiad.wmnet with OS bullseye
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for reimage [[phab:T299741|T299741]]', diff saved to https://phabricator.wikimedia.org/P18957 and previous config saved to /var/cache/conftool/dbconfig/20220121-065854-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18956 and previous config saved to /var/cache/conftool/dbconfig/20220121-065746-root.json
* 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS bullseye
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18955 and previous config saved to /var/cache/conftool/dbconfig/20220121-064243-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18954 and previous config saved to /var/cache/conftool/dbconfig/20220121-062739-root.json
* 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS bullseye
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master [[phab:T299741|T299741]]', diff saved to https://phabricator.wikimedia.org/P18953 and previous config saved to /var/cache/conftool/dbconfig/20220121-062116-marostegui.json
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2030.codfw.wmnet with OS bullseye
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18952 and previous config saved to /var/cache/conftool/dbconfig/20220121-061235-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18951 and previous config saved to /var/cache/conftool/dbconfig/20220121-055732-root.json
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2030.codfw.wmnet with OS bullseye
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18950 and previous config saved to /var/cache/conftool/dbconfig/20220121-054228-root.json


== 2015-12-28 ==
== 2022-01-20 ==
* 21:45 gwicke: restbase: rolling restart to apply https://gerrit.wikimedia.org/r/261206
* 22:40 inflatador: running puppet-merge for https://gerrit.wikimedia.org/r/755810
* 21:26 mutante: tin & mira: started salt minions that were in status stop/waiting
* 22:27 urandom: rolling restart of Cassandra, aqs-next -- [[phab:T298516|T298516]]
* 21:25 logmsgbot: aaron@tin Synchronized private/PrivateSettings.php: (no message) (duration: 00m 30s)
* 21:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
* 21:20 logmsgbot: aaron@tin Synchronized wmf-config/PrivateSettings.php: $wmfSwiftConfig convenience variable (duration: 00m 30s)
* 20:58 jhathaway: rebotting mx1001 to test new kernel
* 20:59 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1050 & db1022 after emergency fix (duration: 00m 31s)
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:52 mutante: cygnus - starting salt-minion
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:35 logmsgbot: yurik@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/modules/graph2.js: https://gerrit.wikimedia.org/r/#/c/261200/ (duration: 00m 31s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:22 ejegg: updated DjangoBannerStats from 8d4a9062aab80e5371faebadd72fbe4f19ac2fdd to a64fe0e373a978d3df0b7f1dd74ac4cc5c78d34e
* 20:37 urandom: upgrading Cassandra to 3.11.11, aqs1010 -- [[phab:T298516|T298516]]
* 18:46 jynus: importing wikishared from x1-master into analytics-slave and setting up replication
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:28 jynus: restarting and upgrading db1050, using the fact that it is depooled
* 20:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 17:16 paravoid: disabled varnish TBF and force-ran puppet on all cp* hosts (I12ea52165e125aaf4ed779399f34cff16d5cd140)
* 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 16:38 jynus: applying production-side replication filters for wikimania2017wiki on labs
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:24 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 30s)
* 20:31 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/DiscussionTools/includes/HeadingItem.php: Backport: [[gerrit:755684{{!}}Prevent assertion failure caused by empty headings (T299583)]] (duration: 00m 50s)
* 16:14 logmsgbot: krenair@tin Synchronized dblists: (no message) (duration: 00m 29s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:13 logmsgbot: krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:12 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/wikimania2017wiki.png: (no message) (duration: 00m 31s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:12 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/260521/ (duration: 00m 30s)
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:55 jynus: cloning db1050's mysql data to db1022
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:15 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Emergency depool of db1050 (duration: 00m 31s)
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Dec 28 02:30:47 UTC 2015 (duration 6m 58s)
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 55s)
* 19:38 bd808@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Remove password clear on block (duration: 00m 50s)
* 19:19 jhathaway: rebooting mx1001 to test new kernel
* 19:17 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:14 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 19:13 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 19:11 cjming: end of UTC evening backport & config window
* 19:10 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 19:10 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:08 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755745{{!}}Disable language alert for pilot wikis except thwiki, viwiki. (T295555)]] (duration: 00m 51s)
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/includes/Hooks.php: Backport: [[gerrit:755682{{!}}Do not try to make watchlist collapsible on wikis where watchlist is disabled (T299671)]] (duration: 00m 50s)
* 18:27 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755741 enhancements for the settings benchmark entrypoint (duration: 00m 51s)
* 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
* 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS buster
* 18:17 mutante: running puppet on cp403*
* 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS buster
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
* 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS buster
* 17:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
* 17:18 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755678{{!}}Revert "Make Block objects aware of which wiki they belong to"]] (duration: 00m 55s)
* 17:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 17:15 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1008.eqiad.wmnet with OS buster
* 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS buster
* 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:04 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 17:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2020.codfw.wmnet with OS buster
* 17:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:55 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS buster
* 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
* 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:50 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755399 add temporary entrypoint for settings benchmark (duration: 00m 50s)
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
* 16:48 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
* 16:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS buster
* 15:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS buster
* 15:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 15:46 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:43 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:20 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 15:16 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
* 15:14 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 15:13 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:12 moritzm: enabled hardware virtualisation in BIOS for ganeti1028 [[phab:T293909|T293909]]
* 15:11 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 15:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
* 15:05 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 15:05 moritzm: enabled hardware virtualisation in BIOS for ganeti1027 [[phab:T293909|T293909]]
* 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 14:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
* 14:56 moritzm: enabled hardware virtualisation in BIOS for ganeti1026 [[phab:T293909|T293909]]
* 14:55 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 11s)
* 14:55 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
* 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
* 14:20 moritzm: enabled hardware virtualisation in BIOS for ganeti1023 [[phab:T283036|T283036]]
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
* 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
* 14:03 moritzm: enabled hardware virtualisation in BIOS for ganeti1024 [[phab:T283036|T283036]]
* 13:55 marostegui: Power off es1022 for onsite maintenance [[phab:T299123|T299123]]
* 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
* 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
* 13:51 moritzm: enabled hardware virtualisation in BIOS for ganeti1025 [[phab:T293909|T293909]]
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
* 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
* 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/CentralNotice/includes/: Backport: [[gerrit:755670{{!}}Replace remaining usages of IDatabase::fetchObject()/::numRows() (T286694)]] (duration: 00m 50s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:03 Lucas_WMDE: UTC morning backport window done
* 13:02 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/deferred/LinksUpdate/LinksUpdate.php: Backport: [[gerrit:755668{{!}}Fix deprecation warning from LinksUpdate::getImages() (T299472)]] (duration: 00m 50s)
* 13:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/maintenance/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (2/2) (duration: 00m 50s)
* 13:00 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (1/2) (duration: 00m 56s)
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755322{{!}}Enable usage tracking for statements in Waray Wikipedia (T296383)]] (expecting some gradual increase of wbc_entity_usage rows on warwiki) (duration: 00m 51s)
* 12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18943 and previous config saved to /var/cache/conftool/dbconfig/20220120-121520-marostegui.json
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
* 12:10 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
* 12:09 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
* 12:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
* 12:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
* 12:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 12:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18942 and previous config saved to /var/cache/conftool/dbconfig/20220120-120015-marostegui.json
* 11:49 moritzm: add ganeti1024 to Ganeti eqiad cluster [[phab:T283036|T283036]]
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18941 and previous config saved to /var/cache/conftool/dbconfig/20220120-114510-marostegui.json
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
* 11:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18940 and previous config saved to /var/cache/conftool/dbconfig/20220120-113006-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18939 and previous config saved to /var/cache/conftool/dbconfig/20220120-112854-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18938 and previous config saved to /var/cache/conftool/dbconfig/20220120-112846-marostegui.json
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:24 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
* 11:23 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
* 11:23 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
* 11:22 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
* 11:21 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:19 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
* 11:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 11:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 11:18 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
* 11:18 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
* 11:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
* 11:13 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
* 11:13 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18937 and previous config saved to /var/cache/conftool/dbconfig/20220120-111341-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18936 and previous config saved to /var/cache/conftool/dbconfig/20220120-105837-marostegui.json
* 10:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 10:52 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1018.eqiad.wmnet with OS buster
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18935 and previous config saved to /var/cache/conftool/dbconfig/20220120-104332-marostegui.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18934 and previous config saved to /var/cache/conftool/dbconfig/20220120-104220-marostegui.json
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18933 and previous config saved to /var/cache/conftool/dbconfig/20220120-104206-marostegui.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18932 and previous config saved to /var/cache/conftool/dbconfig/20220120-102702-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18931 and previous config saved to /var/cache/conftool/dbconfig/20220120-101157-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18930 and previous config saved to /var/cache/conftool/dbconfig/20220120-095652-marostegui.json
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 09:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1018.eqiad.wmnet with OS buster
* 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18929 and previous config saved to /var/cache/conftool/dbconfig/20220120-092232-marostegui.json
* 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18928 and previous config saved to /var/cache/conftool/dbconfig/20220120-092225-marostegui.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18927 and previous config saved to /var/cache/conftool/dbconfig/20220120-091127-root.json
* 09:09 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18926 and previous config saved to /var/cache/conftool/dbconfig/20220120-090720-marostegui.json
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18925 and previous config saved to /var/cache/conftool/dbconfig/20220120-085623-root.json
* 08:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18924 and previous config saved to /var/cache/conftool/dbconfig/20220120-085215-marostegui.json
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18923 and previous config saved to /var/cache/conftool/dbconfig/20220120-084120-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18922 and previous config saved to /var/cache/conftool/dbconfig/20220120-083711-marostegui.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18921 and previous config saved to /var/cache/conftool/dbconfig/20220120-083558-marostegui.json
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18920 and previous config saved to /var/cache/conftool/dbconfig/20220120-083520-marostegui.json
* 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18919 and previous config saved to /var/cache/conftool/dbconfig/20220120-082616-root.json
* 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18918 and previous config saved to /var/cache/conftool/dbconfig/20220120-082015-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for on-site maintenance [[phab:T299123|T299123]]', diff saved to https://phabricator.wikimedia.org/P18917 and previous config saved to /var/cache/conftool/dbconfig/20220120-081809-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18916 and previous config saved to /var/cache/conftool/dbconfig/20220120-081112-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18915 and previous config saved to /var/cache/conftool/dbconfig/20220120-080510-marostegui.json
* 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 07:57 marostegui: Stop mysql on db1117 to clone db1128 [[phab:T299344|T299344]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18913 and previous config saved to /var/cache/conftool/dbconfig/20220120-075609-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18912 and previous config saved to /var/cache/conftool/dbconfig/20220120-075005-marostegui.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18911 and previous config saved to /var/cache/conftool/dbconfig/20220120-074753-marostegui.json
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18910 and previous config saved to /var/cache/conftool/dbconfig/20220120-074746-marostegui.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18909 and previous config saved to /var/cache/conftool/dbconfig/20220120-074105-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18908 and previous config saved to /var/cache/conftool/dbconfig/20220120-073241-marostegui.json
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18907 and previous config saved to /var/cache/conftool/dbconfig/20220120-072558-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18906 and previous config saved to /var/cache/conftool/dbconfig/20220120-071736-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18905 and previous config saved to /var/cache/conftool/dbconfig/20220120-071054-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18904 and previous config saved to /var/cache/conftool/dbconfig/20220120-070231-marostegui.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18903 and previous config saved to /var/cache/conftool/dbconfig/20220120-070119-marostegui.json
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18902 and previous config saved to /var/cache/conftool/dbconfig/20220120-070052-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18901 and previous config saved to /var/cache/conftool/dbconfig/20220120-065551-root.json
* 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1180.eqiad.wmnet with OS bullseye
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18900 and previous config saved to /var/cache/conftool/dbconfig/20220120-064547-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18899 and previous config saved to /var/cache/conftool/dbconfig/20220120-063042-marostegui.json
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1180.eqiad.wmnet with OS bullseye
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18898 and previous config saved to /var/cache/conftool/dbconfig/20220120-061538-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P18897 and previous config saved to /var/cache/conftool/dbconfig/20220120-061529-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18896 and previous config saved to /var/cache/conftool/dbconfig/20220120-061407-marostegui.json
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance


== 2015-12-27 ==
== 2022-01-19 ==
* 03:05 YuviPanda: run nodetool clearsnapshot -- v3 and nodetool clearsnapshot -- v1 on maps-test2001
* 23:36 mutante: deploy1002 - checked freshly generated cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml  with 'openssl x509 -noout -text -in .. {{!}} grep DNS'. now has static-bz on it. ([[phab:T281538|T281538]])
* 02:45 YuviPanda: run drop keyspace v3; on csql on maps-test1001 for yurik
* 23:35 mutante: puppetmaster1001 - revoked puppet cert miscweb.discovery.wmnet; updated kube_services.crts.yaml to include static-bugzilla.wikimedia.org, removed miscweb.discovery.wmnet.crt and .csr.pem, used cergen to check and regenerate cert, committed in private repo, ran puppet on deploy1001 - checked cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml  with 'openssl x509
* 02:42 YuviPanda: run drop keyspace v1; on csql on maps-test1001 for yurik
* 21:43 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 26s)
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec 27 02:30:29 UTC 2015 (duration 6m 59s)
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 35s)
* 20:52 Krinkle: depool mw1340 (api_appserver) for performance and php-apcu testing
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:09 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]] (duration: 00m 49s)
* 20:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 20:04 jhathaway: rebooting mx1001 to debug conntrack
* 19:52 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/tests/phpunit/structure/SettingsTest.php: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 00m 49s)
* 19:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 01m 02s)
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1009.eqiad.wmnet
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1008.eqiad.wmnet
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1007.eqiad.wmnet
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2006.codfw.wmnet
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2005.codfw.wmnet
* 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2004.codfw.wmnet
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
* 19:31 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS buster
* 19:17 cjming@deploy1002: Synchronized wmf-config/config: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:13 cjming@deploy1002: Synchronized wmf-config/config: message (duration: 00m 50s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:12 cjming@deploy1002: Synchronized wmf-config/config/foundationwiki.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 cjming@deploy1002: Synchronized wmf-config/config/viwiki.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 49s)
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 cjming@deploy1002: Synchronized wmf-config/config/ptwikinews.yaml: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 00m 50s)
* 19:09 cjming@deploy1002: Synchronized dblists/desktop-improvements.dblist: Config: [[gerrit:755038{{!}}Update config for pilot wikis: (T298519)]] (duration: 01m 09s)
* 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18893 and previous config saved to /var/cache/conftool/dbconfig/20220119-190137-ladsgroup.json
* 18:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18892 and previous config saved to /var/cache/conftool/dbconfig/20220119-184632-ladsgroup.json
* 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18891 and previous config saved to /var/cache/conftool/dbconfig/20220119-183128-ladsgroup.json
* 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18890 and previous config saved to /var/cache/conftool/dbconfig/20220119-181623-ladsgroup.json
* 18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1110.eqiad.wmnet
* 18:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 18:09 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1110.eqiad.wmnet
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18889 and previous config saved to /var/cache/conftool/dbconfig/20220119-180840-ladsgroup.json
* 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18888 and previous config saved to /var/cache/conftool/dbconfig/20220119-180154-ladsgroup.json
* 17:58 herron: beginning logstash apifeatureusage switchover [[phab:T297239|T297239]]
* 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:54 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:52 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 17:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:575390{{!}}[wikitech] Drop the cloudadmin user group, no longer used and empty (T237890)]] (duration: 00m 50s)
* 17:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:47 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754999{{!}}Disable UserMerge (T216089)]] (duration: 00m 54s)
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18887 and previous config saved to /var/cache/conftool/dbconfig/20220119-174650-ladsgroup.json
* 17:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:42 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754998{{!}}Drop CentralAuthUserMerge log channel (T216089)]] (duration: 01m 05s)
* 17:36 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:35 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18886 and previous config saved to /var/cache/conftool/dbconfig/20220119-173145-ladsgroup.json
* 17:26 _joe_: powercycling contint1001 via ipmi, [[phab:T299542|T299542]]
* 17:25 cmjohnson1: updating firmware, ganeti1018 [[phab:T299527|T299527]]
* 17:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 17:18 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS buster
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18885 and previous config saved to /var/cache/conftool/dbconfig/20220119-171640-ladsgroup.json
* 16:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
* 16:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
* 16:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS buster
* 16:48 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:36 hashar: marking contint1001.wikimedia.org as offline in Jenkins since it is dramatically overloaded [[phab:T299542|T299542]]
* 16:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18883 and previous config saved to /var/cache/conftool/dbconfig/20220119-162717-marostegui.json
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18882 and previous config saved to /var/cache/conftool/dbconfig/20220119-161212-marostegui.json
* 16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS buster
* 16:00 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[134].codfw.wmnet
* 15:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS buster
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18881 and previous config saved to /var/cache/conftool/dbconfig/20220119-155706-marostegui.json
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:48 moritzm: installing tiff security updates on stretch
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18879 and previous config saved to /var/cache/conftool/dbconfig/20220119-154201-marostegui.json
* 15:40 mmandere: cp5005,cp4025: upgrade varnish to 6.0.9 [[phab:T298758|T298758]]
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18878 and previous config saved to /var/cache/conftool/dbconfig/20220119-154046-marostegui.json
* 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18877 and previous config saved to /var/cache/conftool/dbconfig/20220119-154039-marostegui.json
* 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18876 and previous config saved to /var/cache/conftool/dbconfig/20220119-152534-marostegui.json
* 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
* 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS buster
* 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18875 and previous config saved to /var/cache/conftool/dbconfig/20220119-151029-marostegui.json
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS buster
* 15:07 jbond: updating lldp parent fact
* 15:01 moritzm: migrate primary/secondary instances off ganeti1022
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1018.eqiad.wmnet with OS buster
* 14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18873 and previous config saved to /var/cache/conftool/dbconfig/20220119-145525-marostegui.json
* 14:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18872 and previous config saved to /var/cache/conftool/dbconfig/20220119-145410-marostegui.json
* 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18871 and previous config saved to /var/cache/conftool/dbconfig/20220119-145402-marostegui.json
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18870 and previous config saved to /var/cache/conftool/dbconfig/20220119-143858-marostegui.json
* 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:33 jayme: disabled insecure API on all k8s masters - [[phab:T290967|T290967]]
* 14:33 mmandere: esams: upgrade varnish to 6.0.9 [[phab:T298758|T298758]]
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 14:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS buster
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18869 and previous config saved to /var/cache/conftool/dbconfig/20220119-142353-marostegui.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18868 and previous config saved to /var/cache/conftool/dbconfig/20220119-140848-marostegui.json
* 14:04 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db1100.eqiad.wmnet
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18867 and previous config saved to /var/cache/conftool/dbconfig/20220119-140433-marostegui.json
* 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18866 and previous config saved to /var/cache/conftool/dbconfig/20220119-140419-marostegui.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18865 and previous config saved to /var/cache/conftool/dbconfig/20220119-134915-marostegui.json
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:36 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1100.eqiad.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T239814|T239814]])', diff saved to https://phabricator.wikimedia.org/P18864 and previous config saved to /var/cache/conftool/dbconfig/20220119-133514-ladsgroup.json
* 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18863 and previous config saved to /var/cache/conftool/dbconfig/20220119-133410-marostegui.json
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:26 hashar: Restarting Gerrit
* 13:24 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # [[phab:T299451|T299451]] (duration: 00m 08s)
* 13:24 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # [[phab:T299451|T299451]]
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:22 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.16 (duration: 01m 32s)
* 13:20 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.12 (duration: 01m 43s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:19 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.12` apparently missed in previous train  # [[phab:T293958|T293958]] [[phab:T293959|T293959]]
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18862 and previous config saved to /var/cache/conftool/dbconfig/20220119-131905-marostegui.json
* 13:18 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.13 (duration: 03m 11s)
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18861 and previous config saved to /var/cache/conftool/dbconfig/20220119-131750-marostegui.json
* 13:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 13:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18860 and previous config saved to /var/cache/conftool/dbconfig/20220119-131743-marostegui.json
* 13:16 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.13` apparently missed in previous train  # [[phab:T293958|T293958]] [[phab:T293959|T293959]]
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:13 Lucas_WMDE: UTC morning backport+config window done
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:08 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 2) (duration: 29m 08s)
* 13:05 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u www-data rm /tmp/URL*.urlupload_ # save space
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18859 and previous config saved to /var/cache/conftool/dbconfig/20220119-130238-marostegui.json
* 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from dbctl [[phab:T299344|T299344]]', diff saved to https://phabricator.wikimedia.org/P18858 and previous config saved to /var/cache/conftool/dbconfig/20220119-125658-marostegui.json
* 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1155.eqiad.wmnet with OS bullseye
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18857 and previous config saved to /var/cache/conftool/dbconfig/20220119-124733-marostegui.json
* 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:38 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 2)
* 12:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/MediaSearch/extension.json: Backport: [[gerrit:753487{{!}}Revert "Undo update to the way the search interface is set"]] (part 1) (duration: 01m 34s)
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18856 and previous config saved to /var/cache/conftool/dbconfig/20220119-123229-marostegui.json
* 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/Flow/modules/flow/ui/widgets/mw.flow.ui.TopicMenuSelectWidget.js: Backport: [[gerrit:754921{{!}}Fix TopicMenuSelectWidget after OOUI change (T299473)]] (duration: 01m 08s)
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18855 and previous config saved to /var/cache/conftool/dbconfig/20220119-123114-marostegui.json
* 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18854 and previous config saved to /var/cache/conftool/dbconfig/20220119-123106-marostegui.json
* 12:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[12].codfw.wmnet
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1155.eqiad.wmnet with OS bullseye
* 12:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2012.codfw.wmnet with OS buster
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18853 and previous config saved to /var/cache/conftool/dbconfig/20220119-121602-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18852 and previous config saved to /var/cache/conftool/dbconfig/20220119-120057-marostegui.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18851 and previous config saved to /var/cache/conftool/dbconfig/20220119-114949-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18850 and previous config saved to /var/cache/conftool/dbconfig/20220119-114944-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18849 and previous config saved to /var/cache/conftool/dbconfig/20220119-114552-marostegui.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18848 and previous config saved to /var/cache/conftool/dbconfig/20220119-114237-marostegui.json
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18847 and previous config saved to /var/cache/conftool/dbconfig/20220119-114154-marostegui.json
* 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2012.codfw.wmnet with OS buster
* 11:35 moritzm: rebalance ganeti group D in codfw after adding ganeti2026 [[phab:T282603|T282603]]
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18846 and previous config saved to /var/cache/conftool/dbconfig/20220119-113445-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18845 and previous config saved to /var/cache/conftool/dbconfig/20220119-113440-root.json
* 11:32 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 18m 27s)
* 11:28 godog: bounce superset on an-tool1005 - [[phab:T299383|T299383]]
* 11:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2011.codfw.wmnet with OS buster
* 11:28 godog: bounce superset on an-tool1010 - [[phab:T299383|T299383]]
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18844 and previous config saved to /var/cache/conftool/dbconfig/20220119-112649-marostegui.json
* 11:26 godog: bounce navtiming on webperf1001 - [[phab:T299383|T299383]]
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18843 and previous config saved to /var/cache/conftool/dbconfig/20220119-111942-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18842 and previous config saved to /var/cache/conftool/dbconfig/20220119-111937-root.json
* 11:15 moritzm: add ganeti2026 to Ganeti codfw cluster [[phab:T282603|T282603]]
* 11:14 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
* 11:12 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001 (duration: 01m 00s)
* 11:12 filippo@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:754879{{!}}Revert "ProductionServices: use graphite2003 for statsd" (T299383)]] (duration: 02m 09s)
* 11:11 oblivian@deploy1002: Started deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18840 and previous config saved to /var/cache/conftool/dbconfig/20220119-111144-marostegui.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18839 and previous config saved to /var/cache/conftool/dbconfig/20220119-110438-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18838 and previous config saved to /var/cache/conftool/dbconfig/20220119-110433-root.json
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:58 godog: flip graphite back to eqiad - [[phab:T299383|T299383]]
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18837 and previous config saved to /var/cache/conftool/dbconfig/20220119-105640-marostegui.json
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18836 and previous config saved to /var/cache/conftool/dbconfig/20220119-105523-marostegui.json
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18835 and previous config saved to /var/cache/conftool/dbconfig/20220119-104934-root.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18834 and previous config saved to /var/cache/conftool/dbconfig/20220119-104929-root.json
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
* 10:42 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18833 and previous config saved to /var/cache/conftool/dbconfig/20220119-104109-marostegui.json
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2011.codfw.wmnet with OS buster
* 10:40 ayounsi@deploy1002: Finished deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0 (duration: 01m 26s)
* 10:39 ayounsi@deploy1002: Started deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0
* 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2010.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18832 and previous config saved to /var/cache/conftool/dbconfig/20220119-103431-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18831 and previous config saved to /var/cache/conftool/dbconfig/20220119-103425-root.json
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18830 and previous config saved to /var/cache/conftool/dbconfig/20220119-102604-marostegui.json
* 10:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 10:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18829 and previous config saved to /var/cache/conftool/dbconfig/20220119-101927-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18828 and previous config saved to /var/cache/conftool/dbconfig/20220119-101922-root.json
* 10:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply on production
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply on staging
* 10:15 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync on production
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18827 and previous config saved to /var/cache/conftool/dbconfig/20220119-101100-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18826 and previous config saved to /var/cache/conftool/dbconfig/20220119-100424-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18825 and previous config saved to /var/cache/conftool/dbconfig/20220119-100418-root.json
* 10:03 hashar: Upgraded gerrit-replica.wikimedia.org from 3.3.6 to 3.3.9
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18824 and previous config saved to /var/cache/conftool/dbconfig/20220119-095555-marostegui.json
* 09:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # [[phab:T299451|T299451]] (duration: 00m 09s)
* 09:54 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # [[phab:T299451|T299451]]
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18823 and previous config saved to /var/cache/conftool/dbconfig/20220119-095428-marostegui.json
* 09:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18822 and previous config saved to /var/cache/conftool/dbconfig/20220119-095421-marostegui.json
* 09:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
* 09:49 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18821 and previous config saved to /var/cache/conftool/dbconfig/20220119-094920-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18820 and previous config saved to /var/cache/conftool/dbconfig/20220119-094914-root.json
* 09:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply on staging
* 09:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply on production
* 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync on production
* 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply on staging
* 09:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply on production
* 09:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync on staging
* 09:43 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply on production
* 09:43 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply on staging
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18819 and previous config saved to /var/cache/conftool/dbconfig/20220119-093915-marostegui.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18818 and previous config saved to /var/cache/conftool/dbconfig/20220119-093416-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18817 and previous config saved to /var/cache/conftool/dbconfig/20220119-093411-root.json
* 09:32 XioNoX: enable v6 BGP to HE in eqiad for testing
* 09:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1098.eqiad.wmnet with OS bullseye
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18816 and previous config saved to /var/cache/conftool/dbconfig/20220119-092410-marostegui.json
* 09:20 moritzm: migrate primary/secondary instances off ganeti1018
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18813 and previous config saved to /var/cache/conftool/dbconfig/20220119-090905-marostegui.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18812 and previous config saved to /var/cache/conftool/dbconfig/20220119-090839-marostegui.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18811 and previous config saved to /var/cache/conftool/dbconfig/20220119-090832-marostegui.json
* 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1098.eqiad.wmnet with OS bullseye
* 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2129.codfw.wmnet with OS bullseye
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 (s6,s7) for Bullseye reimage [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P18809 and previous config saved to /var/cache/conftool/dbconfig/20220119-085927-marostegui.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18808 and previous config saved to /var/cache/conftool/dbconfig/20220119-085327-marostegui.json
* 08:50 XioNoX: disable v6 BGP to HE in eqiad for testing
* 08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production
* 08:45 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply on staging
* 08:45 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply on production
* 08:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production
* 08:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply on staging
* 08:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply on production
* 08:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on staging
* 08:39 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:39 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18807 and previous config saved to /var/cache/conftool/dbconfig/20220119-083822-marostegui.json
* 08:35 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:35 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 08:33 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2076.codfw.wmnet with OS bullseye
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2129.codfw.wmnet with OS bullseye
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2114.codfw.wmnet with OS bullseye
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18806 and previous config saved to /var/cache/conftool/dbconfig/20220119-082318-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18805 and previous config saved to /var/cache/conftool/dbconfig/20220119-081650-marostegui.json
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18804 and previous config saved to /var/cache/conftool/dbconfig/20220119-081643-marostegui.json
* 08:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2010.codfw.wmnet with OS buster
* 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18803 and previous config saved to /var/cache/conftool/dbconfig/20220119-080138-marostegui.json
* 07:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2114.codfw.wmnet with OS bullseye
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2076.codfw.wmnet with OS bullseye
* 07:52 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
* 07:52 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2124.codfw.wmnet with OS bullseye
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2117.codfw.wmnet with OS bullseye
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18802 and previous config saved to /var/cache/conftool/dbconfig/20220119-074633-marostegui.json
* 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2089.codfw.wmnet with OS bullseye
* 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2095.codfw.wmnet with OS bullseye
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18801 and previous config saved to /var/cache/conftool/dbconfig/20220119-073129-marostegui.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18800 and previous config saved to /var/cache/conftool/dbconfig/20220119-072301-marostegui.json
* 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18799 and previous config saved to /var/cache/conftool/dbconfig/20220119-072253-marostegui.json
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2124.codfw.wmnet with OS bullseye
* 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2117.codfw.wmnet with OS bullseye
* 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2089.codfw.wmnet with OS bullseye
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18797 and previous config saved to /var/cache/conftool/dbconfig/20220119-070749-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18796 and previous config saved to /var/cache/conftool/dbconfig/20220119-065318-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18795 and previous config saved to /var/cache/conftool/dbconfig/20220119-065244-marostegui.json
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2095.codfw.wmnet with OS bullseye
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18794 and previous config saved to /var/cache/conftool/dbconfig/20220119-063739-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18793 and previous config saved to /var/cache/conftool/dbconfig/20220119-063613-marostegui.json
* 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18792 and previous config saved to /var/cache/conftool/dbconfig/20220119-063605-marostegui.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18791 and previous config saved to /var/cache/conftool/dbconfig/20220119-062100-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18790 and previous config saved to /var/cache/conftool/dbconfig/20220119-060555-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18789 and previous config saved to /var/cache/conftool/dbconfig/20220119-055051-marostegui.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18788 and previous config saved to /var/cache/conftool/dbconfig/20220119-054924-marostegui.json
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 01:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753192{{!}}DiscussionTools: Use bullet indentation on ruwiki (T259864)]] (duration: 00m 53s)
* 01:05 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753543{{!}}[wmf-config] Deploy the cawiki test safety survey to production. (T296657)]] (duration: 00m 53s)
* 01:02 catrope@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/DiscussionTools: Backport: [[gerrit:754915{{!}}Enable wikis to customize the syntax used for replies (T259864)]] and [[gerrit:754916{{!}}Ensure the marker appears in a reasonable place when replying with a bullet (T259864)]] (duration: 00m 53s)
* 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/AbuseFilter/: Backport: [[gerrit:754917{{!}}Don't use array keys for OOUI (T299463)]] and [[gerrit:754918{{!}}Don't use array keys for OOUI in AbuseFilterViewDiff (T299463)]] (duration: 00m 54s)
* 00:49 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754054{{!}}Change TheWikipediaLibrary editcount (T288070)]] (duration: 00m 53s)
* 00:38 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:752308{{!}}Use namespaced CentralAuthUser (T298840)]] (duration: 00m 54s)
* 00:35 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754914{{!}}Revert "commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist"]] (duration: 00m 54s)
* 00:33 WFan: re-enable the disabled jobs for civicrm upgrade
* 00:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755026{{!}}azwiki: Change alias Q to QA for the draft namespace (T299332)]] (duration: 00m 53s)
* 00:08 WFan: Upgrade CiviCrm from gerrit #755044
* 00:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755018{{!}}fawiki: Exempt userspaces from being indexed by search engines (T299363)]] (duration: 00m 54s)
* 00:00 WFan: disabling jobs for civiCrm upgrade


== 2015-12-26 ==
== 2022-01-18 ==
* 19:12 paravoid: restarting varnish-frontend on cp3042
* 23:11 jhathaway: rebooting mx1001 to revert to the old kernel
* 19:06 jynus: setting db1030 as the new master of db2028
* 22:59 sbassett: Deployed security patch for [[phab:T298434|T298434]] to 1.38.0-wmf.18
* 18:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Emergency depool of db1022 (duration: 00m 30s)
* 22:57 sbassett: Deployed security patch for [[phab:T298434|T298434]] to 1.380-wmf.17
* 18:28 jynus: disabling lag notifications for codfw (s6)
* 21:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.18  refs [[phab:T293959|T293959]]
* 18:01 paravoid: cp3048: cleaned up /run/vmod_tbf/tbf.db/, kept a backup copy under ~faidon
* 21:29 jhuneidi@deploy1002: Finished scap: testwikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]] (duration: 38m 31s)
* 17:55 paravoid: cp3048: service varnish-frontend stop (sending 429 to lots of people, T122453)
* 21:26 jhathaway: rebooting mx1001, to test new kernel
* 05:38 paravoid: rolling restart of hhvm jobrunners (T122069)
* 20:50 jhuneidi@deploy1002: Started scap: testwikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]]
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Dec 26 02:31:39 UTC 2015 (duration 6m 54s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 39s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ff5874469b717cba38ed7cff0669754517a3553}}: pwnwiki: Deploy Growth features to newcomers ([[phab:T298115|T298115]]) (duration: 02m 14s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:57 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 13hours)
* 17:37 hashar: restarted zuul on contint2001
* 17:16 moritzm: installing gmp security updates
* 16:53 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:52 hashar: contint2001: restarted ferm service
* 16:49 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:47 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2010.codfw.wmnet with OS buster
* 16:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 16:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:14 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
* 16:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
* 16:03 moritzm: installing xen security updates on buster (client-side libraries)
* 15:59 hashar: Shutting down CI for maintenance on contint2001  # [[phab:T283582|T283582]]
* 15:54 godog: update kartotherian certs on maps hosts and roll-reload nginx - [[phab:T297604|T297604]]
* 15:54 moritzm: installing libssh2 security updates on stretch
* 15:50 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
* 15:50 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:47 andrewbogott: resizing the wikitech-static host for [[phab:T298052|T298052]]
* 15:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 02s)
* 15:45 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
* 15:35 godog: regenerate kartotherian certs via cergen - [[phab:T297604|T297604]]
* 14:33 kormat: Deploying wmfmariadbpy 0.8 [[phab:T299406|T299406]]
* 14:33 kormat: uploaded wmfmariadbpy 0.8 to apt.wm.o
* 14:31 moritzm: installing rsync security updates on stretch
* 14:28 moritzm: installing xorg-server security updates on stretch
* 14:10 moritzm: installing vim security updates on stretch
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18780 and previous config saved to /var/cache/conftool/dbconfig/20220118-140540-marostegui.json
* 13:55 XioNoX: update grafana-plugins on grafana hosts - [[phab:T251184|T251184]]
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18779 and previous config saved to /var/cache/conftool/dbconfig/20220118-135036-marostegui.json
* 13:46 XioNoX: add grafana-plugins 0.3 (with worldmap plugin) to reprepo
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18778 and previous config saved to /var/cache/conftool/dbconfig/20220118-133531-marostegui.json
* 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:26 Lucas_WMDE: UTC morning backport window done
* 13:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:754605{{!}}Monitoring: Add '.Save' to distinguish from '.Click' events (T286366)]] (duration: 00m 54s)
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18777 and previous config saved to /var/cache/conftool/dbconfig/20220118-132026-marostegui.json
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:14 moritzm: installing python-babel security updates on buster
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18776 and previous config saved to /var/cache/conftool/dbconfig/20220118-131215-marostegui.json
* 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18775 and previous config saved to /var/cache/conftool/dbconfig/20220118-131208-marostegui.json
* 13:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
* 13:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
* 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:04 ayounsi@deploy1002: Finished deploy [homer/deploy@0f02386]: update requirements (duration: 01m 27s)
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:02 ayounsi@deploy1002: Started deploy [homer/deploy@0f02386]: update requirements
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753969{{!}}fawiki: Add flow-delete right to eliminators (T299223)]] (duration: 00m 51s)
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18774 and previous config saved to /var/cache/conftool/dbconfig/20220118-125703-marostegui.json
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:52 moritzm: installing ghostcript security updates for stretch
* 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754613{{!}}azwiki: Add draft namespace (T299332)]] (duration: 00m 51s)
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18773 and previous config saved to /var/cache/conftool/dbconfig/20220118-124159-marostegui.json
* 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: [[gerrit:754129{{!}}Post-edit dialog: Reload page upon dialog closing for structured tasks (T299188)]] (duration: 00m 51s)
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754612{{!}}commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist (T299247)]] (duration: 00m 51s)
* 12:27 moritzm: imported docker-report bullseye rebuild to apt.wikimedia.org [[phab:T298463|T298463]]
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18772 and previous config saved to /var/cache/conftool/dbconfig/20220118-122654-marostegui.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18771 and previous config saved to /var/cache/conftool/dbconfig/20220118-122546-marostegui.json
* 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18770 and previous config saved to /var/cache/conftool/dbconfig/20220118-122538-marostegui.json
* 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18769 and previous config saved to /var/cache/conftool/dbconfig/20220118-121034-marostegui.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18768 and previous config saved to /var/cache/conftool/dbconfig/20220118-115529-marostegui.json
* 11:46 hashar: Rolled back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # [[phab:T299389|T299389]]
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18767 and previous config saved to /var/cache/conftool/dbconfig/20220118-114024-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18766 and previous config saved to /var/cache/conftool/dbconfig/20220118-113916-marostegui.json
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:28 Amir1: mwscript findBadBlobs.php --wiki=dewiki --revisions {{Gerrit|5730218}} --mark "[[phab:T299387|T299387]]"
* 11:06 moritzm: running gnt-cluster renew-crypto --new-node-certificates for ganeti/eqiad cluster following 2.16 update
* 11:06 mmandere: start rolling upgrade to varnish 6.0.9 [[phab:T298758|T298758]]
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1117.eqiad.wmnet with OS bullseye
* 10:46 moritzm: gnt-cluster upgrade --to 2.16  for ganeti/eqiad cluster
* 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1117.eqiad.wmnet with OS bullseye
* 10:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:01 moritzm: running gnt-cluster renew-crypto --new-cluster-certificate --new-rapi-certificate --new-spice-certificate for ganeti/eqiad cluster
* 10:00 marostegui: Move pc1014 to pc3 [[phab:T299046|T299046]]
* 09:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc2 [[phab:T299046|T299046]] (duration: 00m 50s)
* 09:50 taavi: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "QuiteUnusual" "MarcGarver" # [[phab:T298707|T298707]]
* 09:50 moritzm: installing ganeti 2.16.0-1~bpo9+1+wmf1 on ganeti/eqiad servers [[phab:T296721|T296721]]
* 09:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:41 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:752344{{!}}Enable temporary global user groups on production (T153815)]] (duration: 00m 51s)
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes: Backport: [[gerrit:754602{{!}}page: Use MainObjectStash instead of 'db-replicated' cache (T272512)]] (duration: 00m 56s)
* 09:31 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/extension.json: Backport: [[gerrit:754145{{!}}Disable "inline-media-caption" category (T297443)]] (duration: 00m 51s)
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:06 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes/watcheditem/WatchedItemStore.php: Backport: [[gerrit:754599{{!}}watcheditem: Try getting the cached version in resetNotificationTimestamp]] (duration: 00m 51s)
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bullseye
* 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 08:55 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
* 08:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
* 08:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:37 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/ProofreadPage/includes/Page/PageContentHandler.php: Backport: [[gerrit:754598{{!}}Use fillParserOutputInternal instead of getParserOutput. (T292300)]] (duration: 00m 51s)
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bullseye
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:30 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc2 [[phab:T299046|T299046]] (duration: 00m 51s)
* 08:20 Amir1: cleaning up commons linter errors [[phab:T298782|T298782]]
* 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:12 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/includes/RecordLintJob.php: Backport: [[gerrit:754144{{!}}Drop 'inline-media-caption' lint requests (T297443 T299302)]] (duration: 00m 52s)
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bullseye
* 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 06:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 06:13 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
* 05:59 kart_: Update apertium to 2022-01-18-052631-production ([[phab:T218184|T218184]], [[phab:T202276|T202276]], [[phab:T218184|T218184]], [[phab:T270061|T270061]], [[phab:T248653|T248653]], [[phab:T248293|T248293]], [[phab:T248812|T248812]], [[phab:T248654|T248654]])
* 05:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync on production
* 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
* 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
* 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
* 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
* 05:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: sync on production
* 05:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply on staging
* 05:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply on production
* 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: sync on staging
* 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply on production
* 05:49 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply on staging
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18764 and previous config saved to /var/cache/conftool/dbconfig/20220118-054659-marostegui.json
* 02:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== 2015-12-25 ==
== 2022-01-17 ==
* 15:50 jynus: testing new mariadb packages on db2070
* 23:27 jynus: forced session revocation on phab for a user [[phab:T299315|T299315]]
* 13:19 jynus: setting db2018's binlog_format as MIXED
* 20:48 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided) (duration: 00m 02s)
* 10:51 jynus: powercycle cp3010
* 20:48 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided)
* 09:17 jynus: powercycling cp4007 (unresponsive to ssh, ping, serial console)
* 18:47 krinkle@deploy1002: Finished deploy [integration/docroot@1621c26]: (no justification provided) (duration: 01m 14s)
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Dec 25 02:29:49 UTC 2015 (duration 6m 56s)
* 18:46 krinkle@deploy1002: Started deploy [integration/docroot@1621c26]: (no justification provided)
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s)
* 16:30 moritzm: installing python-virtualenv bugfix updates from bullseye 11.2 point release
* 16:21 moritzm: installing wget bugfix updates from bullseye 11.2 point release
* 16:13 moritzm: installing freeipmi bugfix updates from bullseye 11.2 point release
* 16:02 moritzm: installing curl bugfix updates from bullseye 11.2 point release
* 15:54 mutante: mw1414,mw1415,mw1416,mw1417,mw1418,mw1447,mw1448,mw1449,mw1450,mw1437,mw1438 (all canaries eqiad) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 15:46 mutante: parse2001, parse2002, wtp1025, wtp1026 (all parsoid canaries - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 15:40 mutante: mw2278, mw2279, mw2374, mw2376 (API and jobrunner canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 15:34 mutante: mw2271, mw2272, mw2251, mw2252 (appserver and API canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* ([[phab:T294378|T294378]])
* 15:01 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1003.eqiad.wmnet
* 14:58 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1003.eqiad.wmnet
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2132.codfw.wmnet with OS bullseye
* 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1002.eqiad.wmnet
* 14:48 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1002.eqiad.wmnet
* 14:45 moritzm: imported cassandra 3.11.11 to component/cassandradev for stretch-wikimedia and buster-wikimedia [[phab:T298805|T298805]]
* 14:41 moritzm: systemctl reset-failed ifup@ens5.service on an-airflow1001 [[phab:T273026|T273026]]
* 14:39 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1001.eqiad.wmnet
* 14:37 hnowlan: removing restbase2009 from cassandra configs
* 14:30 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1001.eqiad.wmnet
* 14:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bullseye
* 14:15 marostegui: Reimage db2132 to Bullseye [[phab:T299344|T299344]]
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18762 and previous config saved to /var/cache/conftool/dbconfig/20220117-134520-marostegui.json
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1151.eqiad.wmnet with OS bullseye
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1151.eqiad.wmnet with OS bullseye
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2142.codfw.wmnet with OS bullseye
* 11:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2142.codfw.wmnet with OS bullseye
* 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon1002.eqiad.wmnet
* 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon1002.eqiad.wmnet
* 11:08 moritzm: switching kubetcd1006 to DRBD-backed storage (required for ganeti update)
* 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
* 11:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
* 11:00 moritzm: systemctl reset-failed ifup@ens5.service on kubetcd1005 [[phab:T273026|T273026]]
* 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18761 and previous config saved to /var/cache/conftool/dbconfig/20220117-104801-marostegui.json
* 10:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 10:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bullseye
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18760 and previous config saved to /var/cache/conftool/dbconfig/20220117-104459-marostegui.json
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1153.eqiad.wmnet with OS bullseye
* 10:42 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 10:32 moritzm: switching kubetcd1005 to DRBD-backed storage (required for ganeti update)
* 10:31 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync on staging
* 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
* 10:30 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply on production
* 10:30 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply on staging
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18759 and previous config saved to /var/cache/conftool/dbconfig/20220117-102954-marostegui.json
* 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bullseye
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1153.eqiad.wmnet with OS bullseye
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18758 and previous config saved to /var/cache/conftool/dbconfig/20220117-101450-marostegui.json
* 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2144.codfw.wmnet with OS bullseye
* 10:04 moritzm: switching kubetcd1004 to DRBD-backed storage (required for ganeti update)
* 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
* 10:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2143.codfw.wmnet with OS bullseye
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18757 and previous config saved to /var/cache/conftool/dbconfig/20220117-095945-marostegui.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18756 and previous config saved to /var/cache/conftool/dbconfig/20220117-095837-marostegui.json
* 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 09:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18755 and previous config saved to /var/cache/conftool/dbconfig/20220117-095830-marostegui.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18754 and previous config saved to /var/cache/conftool/dbconfig/20220117-094325-marostegui.json
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2144.codfw.wmnet with OS bullseye
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2143.codfw.wmnet with OS bullseye
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18753 and previous config saved to /var/cache/conftool/dbconfig/20220117-092820-marostegui.json
* 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1017.eqiad.wmnet with OS bullseye
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18752 and previous config saved to /var/cache/conftool/dbconfig/20220117-091316-marostegui.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18751 and previous config saved to /var/cache/conftool/dbconfig/20220117-091308-marostegui.json
* 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18750 and previous config saved to /var/cache/conftool/dbconfig/20220117-091300-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18749 and previous config saved to /var/cache/conftool/dbconfig/20220117-085756-marostegui.json
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1017.eqiad.wmnet with OS bullseye
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18748 and previous config saved to /var/cache/conftool/dbconfig/20220117-084251-marostegui.json
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1003.eqiad.wmnet
* 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1003.eqiad.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18747 and previous config saved to /var/cache/conftool/dbconfig/20220117-082746-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18746 and previous config saved to /var/cache/conftool/dbconfig/20220117-082638-marostegui.json
* 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1004.eqiad.wmnet
* 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1004.eqiad.wmnet
* 06:59 elukey: `systemctl reset-failed ifup@ens5.service` on an-test-client1001 and kafka-test1010
* 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1016.eqiad.wmnet with OS bullseye
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1016.eqiad.wmnet with OS bullseye


== 2015-12-24 ==
== 2022-01-16 ==
* 23:28 mutante: powercycled mw1114
* 08:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 23:27 mutante: i just reset dra on mw1114 because it said it was in use and i didnt see a log yet :;p
* 08:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 23:26 robh: mw1114 spammed all icinga errors, system is outputting endless scroll of login prompt, not halting for input (like anohter session or crash cart is sending it, or an error)
* 08:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
* 17:31 gwicke: aqs: tweaked table properties for local_group_default_T_pageviews_per_article_flat: 2 months max DTCS window size, deflate compression
* 08:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 17:20 jynus: restarting and reconfiguring mysql at db2066
* 08:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
* 16:45 jynus: restart and reconfigure mysql at db2059
* 08:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
* 16:11 jynus: restart and mysql reconfguration of db2052
* 15:02 jynus: restarting and reconfiguring mysql at db2045
* 14:11 paravoid: powercycling mw1012, OOM'ed/stuck
* 14:09 paravoid: rolling restart of hhvm jobrunners (T122069)
* 14:06 jynus: restart and reconfigure mysql at db2038
* 12:32 jynus: restart and reconfigure mysql at db2065
* 12:12 jynus: restart and reconfiguring mysql for db2058
* 11:50 jynus: restarting and reconfiguring mysql at db2051
* 11:28 jynus: restarting and reconfiguring mysql at db2044
* 10:33 jynus: restarting 's2' replication on dbstore200[12] after cloning
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 24 02:30:40 UTC 2015 (duration 6m 52s)
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 03s)
* 01:05 awight: update payments from bae4d02afd8cfe1f8b8617c2f74bb36e420d281d to a7785baa7b40b442ecf0b60d47572502d0759780
* 00:38 gwicke: restbase1003: starting `nodetool cleanup`


== 2015-12-23 ==
== 2022-01-15 ==
* 23:31 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/Graph/modules/ve-graph: https://gerrit.wikimedia.org/r/#/c/260868/ (duration: 00m 31s)
* 08:55 legoktm: finished running recountCategories on s4 wikis ([[phab:T299244|T299244]])
* 19:59 mutante: restbase1004 - puppet stopped and host key changed, what's up?
* 07:58 legoktm: finished running recountCategories on s7 wikis ([[phab:T299244|T299244]])
* 19:42 mutante: ran puppet on mw2112
* 07:51 legoktm: finished running recountCategories on s2 wikis ([[phab:T299244|T299244]])
* 19:41 mutante: logstash1002 - started logstash service
* 06:41 <legoktm>: finished running recountCategories on s3 wikis ([[phab:T299244|T299244]])
* 19:22 logmsgbot: aaron@tin Synchronized wmf-config/CommonSettings.php: Remove unused $wgMaxSquidPurgeTitles setting (duration: 00m 30s)
* 06:21 <legoktm>: finished running recountCategories on s6 wikis ([[phab:T299244|T299244]])
* 18:55 ejegg: updated fundraising tools from ebed29c0eccf38c812b20a957b3487a15bfa9cbc to 1bc23cb4bfaf2a9d4d215aad79dd67d891b5d973
* 06:19 <legoktm>: finished running recountCategories on s5 wikis ([[phab:T299244|T299244]])
* 18:51 ejegg: updated fundraising dashboard from 59e51c4ff74c3c584daf6c5de3bb66daa764cd28 to af8a493ab9ac5431e0d294e5019ac4e426ac6e08
* 06:18 <legoktm>: finished running recountCategories on s8 wikis ([[phab:T299244|T299244]])
* 18:38 jynus: restart and reconfigure mysql at db2037
* 06:14 legoktm: running recountCategories on s3 wikis
* 18:17 mutante: bohrium - finish install, signing puppet certs
* 05:20 legoktm: started recountCategories.php --wiki=enwiki --mode pages ([[phab:T299244|T299244]])
* 17:11 jynus: cloning s2 databases from dbstore2001 to dbstore2002 (s2 replication disabled on both)
* 03:05 legoktm: started refreshLinks --dfn-only via systemd units for s7-s8 ([[phab:T299244|T299244]])
* 14:05 jynus: restart and reconfigure mysql at db2064
* 03:01 legoktm: started refreshLinks --dfn-only via systemd units for s2-s6 ([[phab:T299244|T299244]])
* 13:26 jynus: restart and reconfigure mysql at db2063
* 02:55 legoktm: started mwscript refreshLinks.php --wiki=commonswiki --dfn-only ([[phab:T299244|T299244]])
* 12:41 jynus: reenabling event scheduler on db1046 (eventlogging m4-master)
* 02:54 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only ([[phab:T299244|T299244]])
* 12:30 jynus: restart and reconfigure mysql at db2056
* 02:52 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only
* 12:14 godog: upgrade cassandra on aqs1003
* 01:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:24 jynus: reloading and reconfiguring mysql on db2049
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:11 godog: roll-upgrade cassandra to 2.1.12 on aqs100[123]
* 01:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:55 jynus: rebooting and reconfiguring mysql on db2041
* 01:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:11 jynus: restarting and reconfiguring mysql at db2035
* 01:04 legoktm: starting recountCategories.php --mode pages --wiki enwiki on mwmaint1002
* 09:52 gwicke: rebuilding restbase1004
* 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:51 gwicke: wiped & started boostrap on restbase1008
* 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:18 gwicke: nodetool removenode e2813bb9-f1f2-4d21-ac19-95a7a35b4513 in preparation for adding 1004 to the cluster without bootstrap
* 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Dec 23 02:30:25 UTC 2015 (duration 7m 1s)
* 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 18s)
* 00:58 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:40 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/260696 & https://gerrit.wikimedia.org/r/260699 (duration: 05m 28s)
* 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:37 mutante: mw1133 - powercycle
* 00:52 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 00m 52s)
* 00:36 legoktm: manually fixed up stuck global rename of "RCJU-ArCJ" -> "Archives cantonales jurassiennes"
* 00:51 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:31 matt_flaschen: Ran UPDATE flow_workflow SET workflow_page_id = 41854369 WHERE workflow_wiki = 'enwiki' AND workflow_namespace = 5 AND workflow_title_text = 'Flow/Developer_test_page' AND workflow_page_id = 48099373; to work around DB inconsistency (T117812)
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:46 jforrester@deploy1002: Finished scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]] (duration: 03m 58s)
* 00:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:42 jforrester@deploy1002: Started scap: Revert "LinksUpdate refactor" and follow-ups for [[phab:T299244|T299244]] re. [[phab:T293958|T293958]]
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17"


== 2015-12-22 ==
== 2022-01-14 ==
* 21:46 gwicke: restbase1004: tune2fs -m 0 /dev/mapper/restbase1004--vg-srv
* 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch
* 21:45 gwicke: restbase1004: restarted bootstrap
* 22:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 21:22 gwicke: restbase1003: restarting cassandra to clear up disk space from old stream
* 18:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 21:11 gwicke: restbase1008: restarting cassandra to clear up disk space from old stream
* 18:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 18:36 robh: silver returned to normal service, wikitech.w.o certificate renewed.
* 17:44 bblack: drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue
* 18:26 robh: silver puppet staying stalled during toollabs issue (we dont want to rehup silver web serivce)
* 17:26 bblack: reboot lvs600[23]
* 18:17 robh: puppet disabled on silver, going to update wikitech.wikimedia.org certificate
* 16:55 bblack: reboot lvs6001
* 18:10 jynus: disabling event scheduling on db1046
* 16:30 bblack: rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed)
* 18:03 jynus: rolling schema change (ALTER TABLE ENGINE=TokuDB) on m4-master (db1046) log (eventlogging)
* 16:15 dancy@deploy1002: Synchronized README: Testing php-fpm restart (duration: 03m 18s)
* 16:44 godog: bounce cassandra on restbase1004, restart bootstrap
* 16:04 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 16:42 mutante: powercycling crashed mw1144
* 15:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 16:41 jynus: converting dbstore2001 (delayed slave) into an actual delayed slave, adding redundancy to dbstore1002
* 15:39 bblack: lvs6001 + all services downtimed
* 16:40 godog: bounce cassandra on restbase1003
* 15:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: dc=drmrs
* 16:15 akosiaris: upgrade cassandra on maps-test2001
* 15:00 bblack: silenced site=drmrs in alertmanager for one month, I think
* 16:15 akosiaris: upgrade cassandra on maps-test2002
* 15:00 bblack: silenced site=drmrs in alertmanager, I think
* 15:53 mutante: kafka1001,1002 - crit - eventlogging not running (?)
* 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye
* 15:52 mutante: restbase1003 - disk space, restbase1008 - disk space, restbase1004 - cassandra cql refused
* 13:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:23 akosiaris: upgrade cassandra on maps-test2003
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye
* 15:06 jynus: restarting and reconfiguring mysql at dbstore2001
* 12:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:06 mutante: labtestcontrol2001 - puppet had not been running for a while, a bunch of changes have been applied incl. keys and passwords
* 12:51 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:04 mutante: enabling puppet on labtestcontrol2001
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster
* 15:04 akosiaris: upgraded cassandra on maps-test2004
* 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster
* 11:54 apergos: salt packages with wmf packages precise running on ms-{bf}e* in esams; trusty running on analytics103* in eqiad; jessie running on restbase2* in codfw
* 12:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 11:43 godog: restart cassandra bootstrap on restbase1004
* 12:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 10:09 jynus: online resizing /srv/postgres on labsdb1006 +100GB
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 10:06 hashar: Restarting Jenkins
* 11:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 09:54 apergos: precise and trusty salt packages with wmf patches deployed manually on dataset1001 and analytics1001, seem to work fine
* 11:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
* 08:42 jynus: restarting and reconfiguring mysql at db2036
* 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Dec 22 02:30:28 UTC 2015 (duration 6m 54s)
* 11:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s)
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 00:29 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/260492/ (duration: 00m 32s)
* 11:00 moritzm: systemctl reset-failed ifup@ens5.service on archiva1002 [[phab:T273026|T273026]]
* 00:22 logmsgbot: krenair@tin Synchronized php-1.27.0-wmf.9/extensions/SyntaxHighlight_GeSHi/modules/ve-syntaxhighlight/ve.ui.MWSyntaxHighlightDialogTool.js: https://gerrit.wikimedia.org/r/#/c/260429/ (duration: 00m 30s)
* 10:56 moritzm: rebooting archiva1002 (running archiva.wikimedia.org)
* 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 10:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 10:50 moritzm: systemctl reset-failed ifup@ens5.service on an-test-ui1001 [[phab:T273026|T273026]]
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet
* 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 10:05 moritzm: rebooting matomo1002 (running piwik.wikimedia.org)
* 10:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet
* 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org
* 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org
* 09:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install1003.wikimedia.org
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-client1001.eqiad.wmnet
* 09:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-client1001.eqiad.wmnet
* 09:11 marostegui: Move pc1014 from pc1 to pc2 [[phab:T299046|T299046]]
* 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1009.eqiad.wmnet
* 09:01 moritzm: rebooting an-tool1009 (running hue.wikimedia.org)
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1009.eqiad.wmnet
* 09:00 moritzm: systemctl reset-failed ifup@ens5.service on an-tool1005 [[phab:T273026|T273026]]
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1008.eqiad.wmnet
* 08:58 moritzm: rebooting an-tool1008 (running yarn.wikimedia.org)
* 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1008.eqiad.wmnet
* 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1007.eqiad.wmnet
* 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1007.eqiad.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1005.eqiad.wmnet
* 08:51 moritzm: rebooting an-tool1007 (running turnilo.wikimedia.org)
* 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1005.eqiad.wmnet
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bullseye
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bullseye
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bullseye
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18735 and previous config saved to /var/cache/conftool/dbconfig/20220114-063554-marostegui.json
* 06:15 marostegui: Failover m5 proxy from dbproxy1017 to dbproxy1021 [[phab:T298586|T298586]]
* 05:16 legoktm: manually restarted discard_held_messages service on lists1001, failed with a spurious sqlalchemy issue about packets being out of order
* 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:15 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:09 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/includes/content/WikitextContentHandler.php: Backport: [[gerrit:753828{{!}}In WikitextContentHandler always use getFreshParser() (T299149)]] (duration: 01m 07s)


== 2015-12-21 ==
== 2022-01-13 ==
* 22:41 cwd: updated paymentswiki from a1be1ad134d06464e98de180227554fceddc91d4 to bae4d02afd8cfe1f8b8617c2f74bb36e420d281d
* 22:40 WFan: Updating payment-wiki, revision changed from {{Gerrit|8497eae9}} to {{Gerrit|5cc9d5e0}}
* 20:49 godog: restbase1004 bootstrap failed, restbase1007-a is down java.lang.RuntimeException: A node required to move the data consistently is down (/10.64.0.230).
* 22:18 dzahn@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=miscweb
* 19:27 legoktm: running checkLocalUser.php --delete=1 for real this time on terbium
* 22:00 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
* 19:22 godog: reimage restbase1004
* 21:48 mutante: running puppet on cp-ulsfo
* 19:14 paravoid: powercycling mw1011
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 paravoid: rolling restart of hhvm on the eqiad jobrunners
* 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:47 jynus: common-sync: Copying to mw1016.eqiad.wmnet from tin.eqiad.wmnet
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:35 ori: correction: previous log message was for mw1015, not mw1017
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:27 ori: mw1017: enabled jemalloc profiling, restarted hhvm, now running hhvm-collect-heaps
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:48 akosiaris: restarted hhvm on mw1012.eqiad.wmnet
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:57 thcipriani: timeout on sync-file to mw1016.eqiad.wmnet
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:56 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/Popups/Popups.hooks.php: SWAT: Use ExtensionRegistry to determine whether TextExtracts is installed [[gerrit:260346]] (duration: 02m 48s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:34 jynus: sync-common to mw1085
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:26 jynus: powercycling mw1085.eqiad.wmnet
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:22 thcipriani: mw1085.eqiad.wmnet times out on SSH connection
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:19 godog: reboot restbase1007, load through the roof
* 20:31 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.17"
* 16:18 logmsgbot: thcipriani@tin Synchronized php-1.27.0-wmf.9/extensions/CentralNotice/resources/subscribing/ext.centralNotice.geoIP.js: SWAT: Update CentralNotice [[gerrit:260316]] (duration: 03m 03s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:08 godog: depool restbase1007
* 20:29 dduvall: rolling back wmf.17 from group1 due to a large increase in "Parser state cleared while parsing" across commons and group1 wikipedias ([[phab:T293958|T293958]], [[phab:T299149|T299149]])
* 16:01 apergos: jessie packages for salt with local patches deployed on restbase1001, looks fine but just in case.
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 15:44 godog: adding new 1TB disk to restbase1007
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:22 andrewbogott: disabling puppet on labnet1002 for dnsmasq tests
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:07 MaxSem: me and yurik are nuking old maps data and reimporting planet
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:46 jynus: extending online s2-master data disk by +100GB
* 20:17 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 06s)
* 13:15 akosiaris: disabled puppet on maps-test2001 and commented out osmupdater crontab entry until we fix the sync process
* 20:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 11:02 jynus: emergency restart of db1047's mysql
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:54 jynus: reenabling semisync replication on s3
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:07 godog: stop cassandra on restbase1004, decomissioned
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Dec 21 02:29:51 UTC 2015 (duration 6m 47s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 45s)
* 20:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 02:20 andrewbogott: disabling puppet on labnet1002 to mess with dnsmasq
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 01:44 andrewbogott: disabled puppet on holmium and labservices1001 to control roll-out of https://gerrit.wikimedia.org/r/#/c/260037/
* 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 19:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2051.codfw.wmnet with OS stretch
* 19:40 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
* 19:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753634{{!}}Enable ArticlePlaceholder on dagwiki (T298349)]] (duration: 01m 13s)
* 19:37 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
* 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
* 19:23 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747993{{!}}Add event stream config for ios.notification_interaction (T290920)]] (duration: 01m 13s)
* 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747991{{!}}Add event stream config for android.customize_toolbar_interaction (T297818)]] (duration: 01m 12s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:07 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753793{{!}}Enable skin migration mode on the beta cluster]] (duration: 01m 14s)
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
* 17:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:34 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 17:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 17:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 17:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 16:27 moritzm: impor maps-deduped-tilelist 0.0.5 to buster-wikimedia/main [[phab:T297408|T297408]]
* 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
* 16:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
* 15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:50 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aphlict1001.eqiad.wmnet
* 15:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aphlict1001.eqiad.wmnet
* 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM flowspec1001.eqiad.wmnet
* 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM flowspec1001.eqiad.wmnet
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
* 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
* 15:23 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
* 15:21 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2009.codfw.wmnet with OS buster
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM seaborgium.wikimedia.org
* 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM seaborgium.wikimedia.org
* 15:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1002.wikimedia.org
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1002.wikimedia.org
* 14:56 mmandere: cp3053: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
* 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1001.wikimedia.org
* 14:47 moritzm: systemctl reset-failed ifup@ens5.service on idp1001 [[phab:T273026|T273026]]
* 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp1001.wikimedia.org
* 14:15 moritzm: switch ml-etcd1003 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 14:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
* 13:53 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet
* 13:49 moritzm: switch ml-etcd1002 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
* 13:45 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet
* 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1001.wikimedia.org
* 13:33 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1001.wikimedia.org
* 13:23 moritzm: switch ml-etcd1001 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
* 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1001-dev.eqiad.wmnet
* 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1001-dev.eqiad.wmnet
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18731 and previous config saved to /var/cache/conftool/dbconfig/20220113-124307-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s3 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18730 and previous config saved to /var/cache/conftool/dbconfig/20220113-124300-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s3 codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18729 and previous config saved to /var/cache/conftool/dbconfig/20220113-124140-marostegui.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P18728 and previous config saved to /var/cache/conftool/dbconfig/20220113-123744-marostegui.json
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1002-dev.eqiad.wmnet
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18727 and previous config saved to /var/cache/conftool/dbconfig/20220113-122803-root.json
* 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp1001.wikimedia.org
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp1001.wikimedia.org
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18726 and previous config saved to /var/cache/conftool/dbconfig/20220113-121300-root.json
* 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM eventlog1003.eqiad.wmnet
* 11:59 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM eventlog1003.eqiad.wmnet
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18725 and previous config saved to /var/cache/conftool/dbconfig/20220113-115756-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18724 and previous config saved to /var/cache/conftool/dbconfig/20220113-114252-root.json
* 11:34 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18723 and previous config saved to /var/cache/conftool/dbconfig/20220113-112749-root.json
* 11:26 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
* 11:26 _joe_: update scap everywhere [[phab:T298986|T298986]]
* 11:25 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 09s)
* 11:25 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
* 11:24 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: (no justification provided) (duration: 00m 09s)
* 11:23 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: (no justification provided)
* 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1001.eqiad.wmnet
* 11:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2022.codfw.wmnet with OS bullseye
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1001.eqiad.wmnet
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18722 and previous config saved to /var/cache/conftool/dbconfig/20220113-111245-root.json
* 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
* 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1001.wikimedia.org
* 11:08 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
* 11:03 moritzm: rebooting netbox1001 (running netbox.wikimedia.org)
* 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1001.wikimedia.org
* 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1001.eqiad.wmnet with OS buster
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb1001.eqiad.wmnet
* 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb1001.eqiad.wmnet
* 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18721 and previous config saved to /var/cache/conftool/dbconfig/20220113-105741-root.json
* 10:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
* 10:52 hashar: Restarting Jenkins CI for plugins update [[phab:T298691|T298691]]
* 10:47 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader1001.eqiad.wmnet
* 10:45 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader1001.eqiad.wmnet
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2022.codfw.wmnet with OS bullseye
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18720 and previous config saved to /var/cache/conftool/dbconfig/20220113-104238-root.json
* 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1001.wikimedia.org
* 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1001.eqiad.wmnet with OS buster
* 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc1001.wikimedia.org
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18719 and previous config saved to /var/cache/conftool/dbconfig/20220113-102734-root.json
* 10:27 moritzm: systemctl reset-failed ifup@ens5.service on lists1001 [[phab:T273026|T273026]]
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana1002.eqiad.wmnet
* 10:10 moritzm: rebooting grafana1002 (running grafana.wikimedia.org)
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana1002.eqiad.wmnet
* 10:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 10:02 mmandere: cp3052: upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 10:02 joal@deploy1002: Finished deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] (duration: 21m 47s)
* 10:02 elukey: run kafka preferred-replica-election on kafka-main1001 to force a rebalance of partition leaders (after kafka-main1002's reimage)
* 10:00 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
* 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1002.eqiad.wmnet with OS buster
* 09:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
* 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386]
* 09:40 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] (duration: 00m 07s)
* 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386]
* 09:39 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] (duration: 06m 59s)
* 09:35 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386]
* 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1002.eqiad.wmnet with OS buster
* 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui1001.eqiad.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui1001.eqiad.wmnet
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
* 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM lists1001.wikimedia.org
* 09:02 moritzm: rebooting lists1001 (running lists.wikimedia.org) to pick up new KVM setting
* 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022, give weight to es1021 [[phab:T295965|T295965]] ', diff saved to https://phabricator.wikimedia.org/P18718 and previous config saved to /var/cache/conftool/dbconfig/20220113-085906-marostegui.json
* 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1003.eqiad.wmnet with OS buster
* 08:39 elukey: ipmi mc reset cold for kafka-main1002, mgmt interface not reachable via ssh
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18717 and previous config saved to /var/cache/conftool/dbconfig/20220113-083923-marostegui.json
* 08:28 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753505{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:753504{{!}}Take LogicException into consideration (T299111)]] (duration: 01m 28s)
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1003.eqiad.wmnet with OS buster
* 08:06 marostegui: Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) [[phab:T287244|T287244]]
* 08:02 elukey: ipmi mc reset cold for kafka-main1003, mgmt interface not reachable via ssh
* 07:57 elukey: stop kafka* on kafka-main1003 as prep-step for reimage to buster
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18715 and previous config saved to /var/cache/conftool/dbconfig/20220113-075012-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1015.eqiad.wmnet with OS bullseye
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1015.eqiad.wmnet with OS bullseye
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/includes/export/WikiExporter.php: Backport: [[gerrit:753501{{!}}export: Remove ignoring rev_page_id index (T163532)]] (duration: 01m 28s)
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18714 and previous config saved to /var/cache/conftool/dbconfig/20220113-064113-root.json
* 06:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 06:38 marostegui: Failover m3 proxy from dbproxy1016 to dbproxy1020 [[phab:T298586|T298586]]
* 06:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 06:26 marostegui: Remove rev_page_id from frwiki,jawiki,ruwiki and labswiki from db1096 (s6) [[phab:T285149|T285149]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18713 and previous config saved to /var/cache/conftool/dbconfig/20220113-062609-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18712 and previous config saved to /var/cache/conftool/dbconfig/20220113-061105-root.json
* 06:05 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 27s)
* 05:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18711 and previous config saved to /var/cache/conftool/dbconfig/20220113-055602-root.json
* 05:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 05:53 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/tests/phpunit/unit/includes/libs/rdbms/database/DatabaseSQLTest.php: (no justification provided) (duration: 01m 32s)
* 05:00 TimStarling: doing [[phab:T299095|T299095]] restorations on s3 wikis
* 04:30 TimStarling: on mwmaint1002: inserting 11565 rows into itwiki.pagelinks for [[phab:T299095|T299095]]
* 03:33 TimStarling: on mwmaint1002: inserting {{Gerrit|1714288}} into wikidatawiki.pagelinks for [[phab:T299095|T299095]]
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:30 TimStarling: on mwmaint1002: inserting {{Gerrit|4221344}} rows into commonswiki.pagelinks to clean up from [[phab:T299095|T299095]]
* 02:29 tstarling@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/sql.php: batch size (duration: 01m 28s)
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752751{{!}}Enable CirrusSearch on it/en Wikivoyage]] (duration: 01m 28s)
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752760{{!}}Skip vector-2022 skin in config, not Vector skin (T298923)]] (duration: 01m 29s)
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753584{{!}}Enable Disambiguator notifications on all wikis (T293319)]] (duration: 01m 28s)
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn


== 2015-12-20 ==
== 2022-01-12 ==
* 23:24 Reedy: Katie and Jeff paged about bellatrix
* 23:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:46 andrewbogott: graceful restart of zuul as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:31 andrewbogott: restarting stuck Jenkins
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:47 logmsgbot: reedy@tin Purged l10n cache for 1.27.0-wmf.6
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:11 godog: depool mw1228, reported ro fs
* 23:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:53 logmsgbot: reedy@tin Synchronized README: noop (duration: 00m 32s)
* 23:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.17
* 15:50 Reedy: reedy@tin Purged l10n cache for 1.27.0-wmf.6 (hanging due to mw1228 issue)
* 23:07 jhathaway: rebooting mx1001 to get old kernel
* 15:42 Reedy: mw1228 reporting readonly fs
* 22:48 cwhite: end eqiad opensearch upgrade [[phab:T288621|T288621]]
* 15:41 logmsgbot: reedy@tin Purged l10n cache for 1.27.0-wmf.7
* 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18709 and previous config saved to /var/cache/conftool/dbconfig/20220112-214258-marostegui.json
* 09:00 godog: powercycle ms-be2019, xfs lockup
* 21:28 mbsantos: mbsantos@maps1009.eqiad.wmnet: start imposm-initial-import  - full planet re-import ([[phab:T299049|T299049]])
* 02:28 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec 20 02:28:49 UTC 2015 (duration 6m 54s)
* 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18708 and previous config saved to /var/cache/conftool/dbconfig/20220112-212753-marostegui.json
* 02:21 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 59s)
* 21:19 ryankemper: [WDQS] [[phab:T299098|T299098]] depooled `wdqs2003` so dc-ops can take a look at the PS2 failure
* 21:18 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2] (duration: 06m 57s)
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18707 and previous config saved to /var/cache/conftool/dbconfig/20220112-211248-marostegui.json
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2]
* 21:11 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2] (duration: 00m 07s)
* 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2]
* 21:10 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2] (duration: 24m 20s)
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18706 and previous config saved to /var/cache/conftool/dbconfig/20220112-205744-marostegui.json
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18705 and previous config saved to /var/cache/conftool/dbconfig/20220112-205636-marostegui.json
* 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18704 and previous config saved to /var/cache/conftool/dbconfig/20220112-205629-marostegui.json
* 20:46 joal@deploy1002: Started deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2]
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18703 and previous config saved to /var/cache/conftool/dbconfig/20220112-204124-marostegui.json
* 20:36 dduvall: 1.38.0-wmf.17 rolled back from group1 due to large spike in db read-only errors and slow queries ([[phab:T293958|T293958]])
* 20:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.38.0-wmf.17
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18702 and previous config saved to /var/cache/conftool/dbconfig/20220112-202619-marostegui.json
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:21 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 01m 21s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:19 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:19 jgleeson: updated payments from {{Gerrit|939cb4bc}} to {{Gerrit|8497eae9}}
* 20:17 mutante: applying firewall change on phabricator (VCS, git-ssh), second attempt, first codfw-only
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18701 and previous config saved to /var/cache/conftool/dbconfig/20220112-201114-marostegui.json
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18700 and previous config saved to /var/cache/conftool/dbconfig/20220112-200806-marostegui.json
* 20:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18699 and previous config saved to /var/cache/conftool/dbconfig/20220112-200759-marostegui.json
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18698 and previous config saved to /var/cache/conftool/dbconfig/20220112-195254-marostegui.json
* 19:52 hashar: Restarting CI Jenkins once more to apply the Gearman plugin update [[phab:T298691|T298691]]
* 19:44 hashar: Clearing /srv partition on integration-castor03
* 19:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18697 and previous config saved to /var/cache/conftool/dbconfig/20220112-193749-marostegui.json
* 19:34 hashar: Upgrading CI Jenkins and Gearman plugin [[phab:T298691|T298691]]
* 19:29 mutante: wdqs2003 - one power supply failed so it's not redundant anymore, says Icinga
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 cwhite: begin eqiad opensearch upgrade [[phab:T288621|T288621]]
* 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18696 and previous config saved to /var/cache/conftool/dbconfig/20220112-192244-marostegui.json
* 19:22 mutante: deneb - for some reason the "package builder clean up build directory"-service fails [[phab:T287222|T287222]]
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:21 cjming: end of UTC evening backport & config window
* 19:21 mutante: [deneb:~] $ sudo systemctl start  package_builder_Clean_up_build_directory.service
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753187{{!}}Add new vector skin key to RelatedArticlesFooterAllowedSkins. (T298916)]] (duration: 01m 21s)
* 19:18 mutante: pybal-test2002 - apt-get clean after icinga alert about disk space running out
* 19:17 mutante: zookeeper-test1002 - CRITICAL - degraded: The following units failed: ifup@ens5.service - for this issue see [[phab:T273026|T273026]] ([[phab:T268074|T268074]])
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:14 mutante: elastic10180 - one power supply seeming failed - see icinga IPMI alert - [Status = Critical, PS Redundancy = Critical] [[phab:T294805|T294805]]
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18695 and previous config saved to /var/cache/conftool/dbconfig/20220112-191436-marostegui.json
* 19:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18694 and previous config saved to /var/cache/conftool/dbconfig/20220112-191428-marostegui.json
* 19:13 cjming@deploy1002: Synchronized php-1.38.0-wmf.17/includes/export/WikiExporter.php: Backport: [[gerrit:753085{{!}}Partial revert of I1a691f01cd82e60bf41207d32501edb4b9835e37 to unbreak dumps (T299020)]] (duration: 01m 22s)
* 19:12 mutante: mirror1001 - CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service - [[phab:T286898|T286898]]
* 19:09 hashar: Upgraded releases Jenkins from 2.319.1 to 2.319.2 # [[phab:T298691|T298691]]
* 19:06 moritzm: imported jenkins 2.319.2 to thirdparty/ci fpr buster-wikimedia
* 19:05 mutante: [mwmaint1002:~] $ sudo systemctl status mediawiki_job_updatequerypages_mostlinked_s3@13.service (running fine but had failed for unknown reason last time it was supposed to run automatically)
* 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18693 and previous config saved to /var/cache/conftool/dbconfig/20220112-185923-marostegui.json
* 18:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18692 and previous config saved to /var/cache/conftool/dbconfig/20220112-184418-marostegui.json
* 18:40 mutante: phab1001 - temp disabling puppet - deployed firewall change on phab2001 - debugging - no impact
* 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18691 and previous config saved to /var/cache/conftool/dbconfig/20220112-182913-marostegui.json
* 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18690 and previous config saved to /var/cache/conftool/dbconfig/20220112-182806-marostegui.json
* 18:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18689 and previous config saved to /var/cache/conftool/dbconfig/20220112-182725-marostegui.json
* 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18688 and previous config saved to /var/cache/conftool/dbconfig/20220112-181220-marostegui.json
* 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18687 and previous config saved to /var/cache/conftool/dbconfig/20220112-175715-marostegui.json
* 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18686 and previous config saved to /var/cache/conftool/dbconfig/20220112-174211-marostegui.json
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18685 and previous config saved to /var/cache/conftool/dbconfig/20220112-174103-marostegui.json
* 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18684 and previous config saved to /var/cache/conftool/dbconfig/20220112-174056-marostegui.json
* 17:38 _joe_: deploying scap 4.1.1 to the restbase canaries [[phab:T298986|T298986]]
* 17:34 _joe_: deploying scap 4.1.1 to the mediawiki canaries [[phab:T298986|T298986]]
* 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bullseye
* 17:27 dancy@deploy1002: Started scap: testing
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18683 and previous config saved to /var/cache/conftool/dbconfig/20220112-172551-marostegui.json
* 17:25 dancy@deploy1002: Started scap: testing
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18682 and previous config saved to /var/cache/conftool/dbconfig/20220112-171047-marostegui.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 17:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 16:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1005.eqiad.wmnet
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18681 and previous config saved to /var/cache/conftool/dbconfig/20220112-165542-marostegui.json
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18680 and previous config saved to /var/cache/conftool/dbconfig/20220112-165434-marostegui.json
* 16:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1005.eqiad.wmnet
* 16:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 16:53 hnowlan: Decommissioning cassandra instance restbase2009-c via nodetool
* 16:48 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:46 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
* 16:45 elukey: elukey@prometheus2004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:44 elukey: elukey@prometheus2003:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:40 elukey: elukey@prometheus1004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
* 16:39 elukey: elukey@prometheus1003:~$ sudo apt-get remove linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64 linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18678 and previous config saved to /var/cache/conftool/dbconfig/20220112-163919-marostegui.json
* 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx1001.wikimedia.org
* 16:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1004.eqiad.wmnet
* 16:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx1001.wikimedia.org
* 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:31 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1004.eqiad.wmnet
* 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:25 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
* 16:25 elukey: stop kafka* on kafka-main1003 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18677 and previous config saved to /var/cache/conftool/dbconfig/20220112-162414-marostegui.json
* 16:20 moritzm: switch kubestagetcd1006 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18676 and previous config saved to /var/cache/conftool/dbconfig/20220112-160910-marostegui.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18675 and previous config saved to /var/cache/conftool/dbconfig/20220112-160802-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18674 and previous config saved to /var/cache/conftool/dbconfig/20220112-160755-marostegui.json
* 16:02 elukey: stop kafka* on kafka-main1002 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:57 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 15:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 15:56 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18673 and previous config saved to /var/cache/conftool/dbconfig/20220112-155250-marostegui.json
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18672 and previous config saved to /var/cache/conftool/dbconfig/20220112-153745-marostegui.json
* 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18671 and previous config saved to /var/cache/conftool/dbconfig/20220112-152240-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18670 and previous config saved to /var/cache/conftool/dbconfig/20220112-152133-marostegui.json
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18669 and previous config saved to /var/cache/conftool/dbconfig/20220112-152121-marostegui.json
* 15:14 elukey: stop kafka* on kafka-main1001 to allow dcops maintenance (nic/bios upgrades) - [[phab:T298867|T298867]]
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18668 and previous config saved to /var/cache/conftool/dbconfig/20220112-150616-marostegui.json
* 14:59 moritzm: switch kubestagetcd1005 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply on main
* 14:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18667 and previous config saved to /var/cache/conftool/dbconfig/20220112-145111-marostegui.json
* 14:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 14:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 14:40 jelto: remove helm2 from deployment_server [[phab:T251305|T251305]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/753026
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
* 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
* 14:37 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
* 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18666 and previous config saved to /var/cache/conftool/dbconfig/20220112-143606-marostegui.json
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18665 and previous config saved to /var/cache/conftool/dbconfig/20220112-143258-marostegui.json
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18664 and previous config saved to /var/cache/conftool/dbconfig/20220112-143241-marostegui.json
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:23 moritzm: switch kubestagetcd1004 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18663 and previous config saved to /var/cache/conftool/dbconfig/20220112-141736-marostegui.json
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part III (duration: 01m 07s)
* 14:15 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part II (duration: 01m 08s)
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1002.eqiad.wmnet
* 14:14 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: [[gerrit:702421{{!}}Merge db-codfw.php and db-eqiad.php into db-production.php (T260297)]], Part I (duration: 01m 07s)
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1002.eqiad.wmnet
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1001.eqiad.wmnet
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18662 and previous config saved to /var/cache/conftool/dbconfig/20220112-140232-marostegui.json
* 14:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1001.eqiad.wmnet
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18661 and previous config saved to /var/cache/conftool/dbconfig/20220112-135858-marostegui.json
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18659 and previous config saved to /var/cache/conftool/dbconfig/20220112-134727-marostegui.json
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18658 and previous config saved to /var/cache/conftool/dbconfig/20220112-134620-marostegui.json
* 13:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18657 and previous config saved to /var/cache/conftool/dbconfig/20220112-134103-root.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753441{{!}}Disable flaggedrevs stable template inclusion in ruwikisource (T226054)]] (duration: 01m 08s)
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18656 and previous config saved to /var/cache/conftool/dbconfig/20220112-132600-root.json
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
* 13:20 urbanecm@deploy1002: Finished scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set (duration: 19m 19s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard1002.eqiad.wmnet
* 13:18 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
* 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard1002.eqiad.wmnet
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18655 and previous config saved to /var/cache/conftool/dbconfig/20220112-131056-root.json
* 13:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor1002.eqiad.wmnet
* 13:01 urbanecm@deploy1002: Started scap: {{Gerrit|4b1e241}}: Undo update to the way the search interface is set
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18654 and previous config saved to /var/cache/conftool/dbconfig/20220112-130050-marostegui.json
* 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor1002.eqiad.wmnet
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18653 and previous config saved to /var/cache/conftool/dbconfig/20220112-125552-root.json
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid1002.eqiad.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s7 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P18652 and previous config saved to /var/cache/conftool/dbconfig/20220112-125402-marostegui.json
* 12:52 awight: EU deployment reopened :-)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P18651 and previous config saved to /var/cache/conftool/dbconfig/20220112-125208-marostegui.json
* 12:51 awight: EU deployment complete
* 12:50 awight@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/TemplateData: Backport: [[gerrit:752775{{!}}Allow aliases to be integers in addition to strings (T298795)]] (duration: 01m 07s)
* 12:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid1002.eqiad.wmnet
* 12:48 Amir1: removing orphan lint error reports in all wikis ([[phab:T298782|T298782]])
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18650 and previous config saved to /var/cache/conftool/dbconfig/20220112-124514-marostegui.json
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18649 and previous config saved to /var/cache/conftool/dbconfig/20220112-123010-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18648 and previous config saved to /var/cache/conftool/dbconfig/20220112-122742-marostegui.json
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18647 and previous config saved to /var/cache/conftool/dbconfig/20220112-121505-marostegui.json
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cfe389afce8037121f8e8b672f4fdf2458a068dd}}: fawiki: Add extendedmover usergroup ([[phab:T299038|T299038]]) (duration: 01m 08s)
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1002.eqiad.wmnet
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18646 and previous config saved to /var/cache/conftool/dbconfig/20220112-120931-marostegui.json
* 12:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1002.eqiad.wmnet
* 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1001.eqiad.wmnet
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1001.eqiad.wmnet
* 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases1002.eqiad.wmnet
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18645 and previous config saved to /var/cache/conftool/dbconfig/20220112-120000-marostegui.json
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases1002.eqiad.wmnet
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18644 and previous config saved to /var/cache/conftool/dbconfig/20220112-115259-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18643 and previous config saved to /var/cache/conftool/dbconfig/20220112-115031-marostegui.json
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18642 and previous config saved to /var/cache/conftool/dbconfig/20220112-115024-marostegui.json
* 11:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 11:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18641 and previous config saved to /var/cache/conftool/dbconfig/20220112-113518-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18640 and previous config saved to /var/cache/conftool/dbconfig/20220112-113119-marostegui.json
* 11:21 elukey: move kafka-jumbo nodes to fixed kafka uid/gid - [[phab:T296990|T296990]]
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18639 and previous config saved to /var/cache/conftool/dbconfig/20220112-112013-marostegui.json
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18638 and previous config saved to /var/cache/conftool/dbconfig/20220112-110508-marostegui.json
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dborch1001.wikimedia.org
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dborch1001.wikimedia.org
* 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:59 moritzm: rebalance ganeti/codfw row B (all nodes reimaged to Buster)
* 10:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18637 and previous config saved to /var/cache/conftool/dbconfig/20220112-105650-marostegui.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18636 and previous config saved to /var/cache/conftool/dbconfig/20220112-105540-marostegui.json
* 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18635 and previous config saved to /var/cache/conftool/dbconfig/20220112-105532-marostegui.json
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dbmonitor1002.wikimedia.org
* 10:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply on main
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dbmonitor1002.wikimedia.org
* 10:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:47 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply on main
* 10:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:41 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18634 and previous config saved to /var/cache/conftool/dbconfig/20220112-104028-marostegui.json
* 10:39 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: sync on main
* 10:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply on main
* 10:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18633 and previous config saved to /var/cache/conftool/dbconfig/20220112-103619-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 10:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P18632 and previous config saved to /var/cache/conftool/dbconfig/20220112-103144-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18631 and previous config saved to /var/cache/conftool/dbconfig/20220112-102938-marostegui.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18630 and previous config saved to /var/cache/conftool/dbconfig/20220112-102523-marostegui.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18629 and previous config saved to /var/cache/conftool/dbconfig/20220112-101018-marostegui.json
* 10:08 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
* 10:06 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:57 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc1 (duration: 01m 07s)
* 09:54 hnowlan: Decommissioning cassandra instance restbase2009-b via nodetool
* 09:53 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
* 09:51 moritzm: reverting kubetcd2006 back to "plain" storage
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
* 09:51 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
* 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bullseye
* 09:21 moritzm: reverting kubetcd2005 back to "plain" storage
* 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
* 09:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bullseye
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18628 and previous config saved to /var/cache/conftool/dbconfig/20220112-090959-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc1 (duration: 01m 08s)
* 09:05 marostegui: Reset replication on pc1014
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18627 and previous config saved to /var/cache/conftool/dbconfig/20220112-085024-marostegui.json
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb1002.eqiad.wmnet
* 08:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb1002.eqiad.wmnet
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18626 and previous config saved to /var/cache/conftool/dbconfig/20220112-083520-marostegui.json
* 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1001.eqiad.wmnet
* 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1001.eqiad.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18625 and previous config saved to /var/cache/conftool/dbconfig/20220112-082015-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18624 and previous config saved to /var/cache/conftool/dbconfig/20220112-080510-marostegui.json
* 08:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
* 07:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
* 07:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: sync on main
* 07:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
* 07:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
* 07:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
* 07:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 07:41 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 07:41 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 07:40 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 07:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 07:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 07:29 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 07:28 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18623 and previous config saved to /var/cache/conftool/dbconfig/20220112-072826-marostegui.json
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18622 and previous config saved to /var/cache/conftool/dbconfig/20220112-071003-marostegui.json
* 07:02 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
* 06:58 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
* 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
* 06:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
* 06:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
* 06:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
* 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18621 and previous config saved to /var/cache/conftool/dbconfig/20220112-065458-marostegui.json
* 06:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync on main
* 06:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
* 06:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
* 06:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
* 06:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
* 06:48 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18620 and previous config saved to /var/cache/conftool/dbconfig/20220112-063953-marostegui.json
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:36 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18619 and previous config saved to /var/cache/conftool/dbconfig/20220112-062449-marostegui.json
* 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18618 and previous config saved to /var/cache/conftool/dbconfig/20220112-060923-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18617 and previous config saved to /var/cache/conftool/dbconfig/20220112-060803-marostegui.json
* 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:09 urbanecm: UTC late evening B&C done
* 00:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24a26392a3e36aa3a46445eb1f87e808b57b19c8}}: Enable Disambiguator notifications for French Wikipedia ([[phab:T293319|T293319]]) (duration: 01m 08s)
* 00:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:03 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== 2015-12-19 ==
== 2022-01-11 ==
* 21:55 _joe_: restarted zotero on sca1001, various OOM messages
* 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:48 gwicke: restbase1004: `systemctl mask cassandra` in preparation for the decommission finishing
* 23:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:49 akosiaris: killed gmond on db2036. it was clearly misbehaving and running since Jan 02. db2036 was not listed on the ganglia web interface. killing the orphaned process and restarting seems to have fixed it
* 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:54 akosiaris: scheduled maintenance of s3 slave lag on db2036, db2043, db2050, db2057 (all of db2018's family that pages) to effectively silence pages while debugging. Check is flapping since 15:00 UTC today
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 15:14 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/259611/ - noop for prod, other than making icinga stop complaining (duration: 00m 31s)
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 10:07 hashar: CI jobs for MediaWiki were broken because of cssjanus dependency. Should be fixed once mw/core https://gerrit.wikimedia.org/r/#/c/260169/ lands
* 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:28 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Dec 19 02:28:56 UTC 2015 (duration 6m 53s)
* 23:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 53s)
* 23:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:01 gwicke: entire restbase cluster: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase$NODE--vg-{srv,var}
* 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:49 gwicke: restbase1008: removed 5% root reserve from data partition with tune2fs -m 0 /dev/mapper/restbase1008--vg-srv
* 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 23:05 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Backport: [[gerrit:753071{{!}}Watchlist API update: Call correct method (T298999)]] (duration: 02m 40s)
* 23:04 dduvall: syncing backport to fix VE regression that followed testwiki/group0 deployment (cc [[phab:T293958|T293958]])
* 21:29 mutante: mw1418 - apt-get remove --purge fonts*; apt-get remove --purge xfonts*; running puppet - nothing gets reinstalled and with --purge it means 'dpkg -l {{!}} grep fonts' is actually empty, not full of "rc" still - [[phab:T294378|T294378]]
* 21:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18615 and previous config saved to /var/cache/conftool/dbconfig/20220111-211134-marostegui.json
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18614 and previous config saved to /var/cache/conftool/dbconfig/20220111-205629-marostegui.json
* 20:56 mutante: mw1418 (lowest numbered canary appserver that we use for httpbb hourly tests on cumin1001) - apt-get autoremove - removed font* and python3* packages - reason: [[phab:T294378|T294378]]
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 20:42 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1009.eqiad.wmnet
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18613 and previous config saved to /var/cache/conftool/dbconfig/20220111-204124-marostegui.json
* 20:38 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1009.eqiad.wmnet
* 20:38 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 20:36 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1008.eqiad.wmnet
* 20:32 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1008.eqiad.wmnet
* 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1007.eqiad.wmnet
* 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1032.eqiad.wmnet
* 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1007.eqiad.wmnet
* 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1032.eqiad.wmnet
* 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1031.eqiad.wmnet
* 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1030.eqiad.wmnet
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18612 and previous config saved to /var/cache/conftool/dbconfig/20220111-202620-marostegui.json
* 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18611 and previous config saved to /var/cache/conftool/dbconfig/20220111-202513-marostegui.json
* 20:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 20:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18610 and previous config saved to /var/cache/conftool/dbconfig/20220111-202505-marostegui.json
* 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1031.eqiad.wmnet
* 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1030.eqiad.wmnet
* 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1024.eqiad.wmnet
* 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1025.eqiad.wmnet
* 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18609 and previous config saved to /var/cache/conftool/dbconfig/20220111-201000-marostegui.json
* 20:09 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1025.eqiad.wmnet
* 20:08 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1024.eqiad.wmnet
* 20:01 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]] (duration: 39m 38s)
* 19:59 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1023.eqiad.wmnet
* 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18608 and previous config saved to /var/cache/conftool/dbconfig/20220111-195456-marostegui.json
* 19:53 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1023.eqiad.wmnet
* 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18607 and previous config saved to /var/cache/conftool/dbconfig/20220111-193951-marostegui.json
* 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18606 and previous config saved to /var/cache/conftool/dbconfig/20220111-193844-marostegui.json
* 19:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 19:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18605 and previous config saved to /var/cache/conftool/dbconfig/20220111-193836-marostegui.json
* 19:30 sukhe: upload pdns-recursor_4.6.0-1wm1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18604 and previous config saved to /var/cache/conftool/dbconfig/20220111-192331-marostegui.json
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:21 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.17  refs [[phab:T293958|T293958]]
* 19:17 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1002.eqiad.wmnet
* 19:13 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1002.eqiad.wmnet
* 19:13 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1001.eqiad.wmnet
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18603 and previous config saved to /var/cache/conftool/dbconfig/20220111-190827-marostegui.json
* 19:05 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1001.eqiad.wmnet
* 19:05 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1002.wikimedia.org
* 19:04 dduvall@deploy1002: Pruned MediaWiki: 1.38.0-wmf.9 (duration: 15m 51s)
* 19:01 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1002.wikimedia.org
* 19:00 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1001.wikimedia.org
* 18:58 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1001.wikimedia.org
* 18:57 ebernhardson: clear wcqs.jnl and aliases.map for all wcqs instances [[phab:T296470|T296470]]
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18602 and previous config saved to /var/cache/conftool/dbconfig/20220111-185322-marostegui.json
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18601 and previous config saved to /var/cache/conftool/dbconfig/20220111-185215-marostegui.json
* 18:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18600 and previous config saved to /var/cache/conftool/dbconfig/20220111-185208-marostegui.json
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 18:41 _joe_: also ran apt-get autoremove on mwdebug1002
* 18:41 _joe_: installed scap 4.1.1 on mwdebug1002 [[phab:T298986|T298986]], ran scap pull successfully
* 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18599 and previous config saved to /var/cache/conftool/dbconfig/20220111-183703-marostegui.json
* 18:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS buster
* 18:29 _joe_: uploaded scap 4.1.1-1 to apt [[phab:T298986|T298986]]
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18598 and previous config saved to /var/cache/conftool/dbconfig/20220111-182158-marostegui.json
* 18:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
* 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18597 and previous config saved to /var/cache/conftool/dbconfig/20220111-180653-marostegui.json
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18596 and previous config saved to /var/cache/conftool/dbconfig/20220111-180547-marostegui.json
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18595 and previous config saved to /var/cache/conftool/dbconfig/20220111-180534-marostegui.json
* 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18594 and previous config saved to /var/cache/conftool/dbconfig/20220111-175029-marostegui.json
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
* 17:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18593 and previous config saved to /var/cache/conftool/dbconfig/20220111-173524-marostegui.json
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18592 and previous config saved to /var/cache/conftool/dbconfig/20220111-172019-marostegui.json
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18591 and previous config saved to /var/cache/conftool/dbconfig/20220111-171912-marostegui.json
* 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18590 and previous config saved to /var/cache/conftool/dbconfig/20220111-171905-marostegui.json
* 17:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 02m 04s)
* 17:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1002.eqiad.wmnet
* 17:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
* 17:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 03m 33s)
* 17:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1002.eqiad.wmnet
* 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1001.eqiad.wmnet
* 17:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
* 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:04 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18589 and previous config saved to /var/cache/conftool/dbconfig/20220111-170400-marostegui.json
* 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:03 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1001.eqiad.wmnet
* 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:00 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18588 and previous config saved to /var/cache/conftool/dbconfig/20220111-164856-marostegui.json
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18587 and previous config saved to /var/cache/conftool/dbconfig/20220111-163351-marostegui.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18586 and previous config saved to /var/cache/conftool/dbconfig/20220111-163244-marostegui.json
* 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18585 and previous config saved to /var/cache/conftool/dbconfig/20220111-163237-marostegui.json
* 16:29 arturo: aborrero@apt1001:~ $ sudo -i reprepro clearvanished
* 16:23 arturo: aborrero@apt1001:~ $ sudo -i reprepro --noskipold --component thirdparty/kubeadm-k8s-1-21 update buster-wikimedia
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18584 and previous config saved to /var/cache/conftool/dbconfig/20220111-161732-marostegui.json
* 16:03 cwhite: begin rolling restart of opensearch in codfw - jvm upgrade
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18583 and previous config saved to /var/cache/conftool/dbconfig/20220111-160227-marostegui.json
* 15:59 vgutierrez: re-enable puppet on acme-chief clients after acmechief1001 reboot - [[phab:T294120|T294120]]
* 15:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief1001.eqiad.wmnet
* 15:56 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief1001.eqiad.wmnet
* 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
* 15:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
* 15:55 vgutierrez: disable puppet on acme-chief clients for acmechief1001 reboot - [[phab:T294120|T294120]]
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test1001.eqiad.wmnet
* 15:51 ebernhardson: restart elasticserach_6@production-search-psi-eqiad on elastic1049 to resolve issue with full heap
* 15:47 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test1001.eqiad.wmnet
* 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18582 and previous config saved to /var/cache/conftool/dbconfig/20220111-154722-marostegui.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18580 and previous config saved to /var/cache/conftool/dbconfig/20220111-154615-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18579 and previous config saved to /var/cache/conftool/dbconfig/20220111-154608-marostegui.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18578 and previous config saved to /var/cache/conftool/dbconfig/20220111-153103-marostegui.json
* 15:30 hnowlan: Decommissioning cassandra instance restbase2009-a via nodetool
* 15:22 arnoldokoth: systemctl reset-failed ifup@ens5.service on otrs1001 [[phab:T273026|T273026]]
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18577 and previous config saved to /var/cache/conftool/dbconfig/20220111-151558-marostegui.json
* 15:10 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM otrs1001.eqiad.wmnet
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
* 15:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
* 15:02 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM otrs1001.eqiad.wmnet
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18576 and previous config saved to /var/cache/conftool/dbconfig/20220111-150054-marostegui.json
* 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM etherpad1002.eqiad.wmnet
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18575 and previous config saved to /var/cache/conftool/dbconfig/20220111-145947-marostegui.json
* 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18574 and previous config saved to /var/cache/conftool/dbconfig/20220111-145939-marostegui.json
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM zookeeper-test1002.eqiad.wmnet
* 14:56 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM etherpad1002.eqiad.wmnet
* 14:48 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM zookeeper-test1002.eqiad.wmnet
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping1002.eqiad.wmnet
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping1002.eqiad.wmnet
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18573 and previous config saved to /var/cache/conftool/dbconfig/20220111-144435-marostegui.json
* 14:38 XioNoX: disable ping-offload in eqiad
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:35 marostegui: Upgrade pc1014 mysql
* 14:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751949{{!}}Clean up nova-network remains]] (2/2) (duration: 02m 40s)
* 14:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:751949{{!}}Clean up nova-network remains]] (1/2) (duration: 02m 49s)
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18572 and previous config saved to /var/cache/conftool/dbconfig/20220111-142930-marostegui.json
* 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:25 taavi@deploy1002: Synchronized wmf-config/reverse-proxy.php: Config: [[gerrit:751952{{!}}reverse-proxy: add drmrs ranges (T282787)]] (duration: 01m 36s)
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bullseye
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18571 and previous config saved to /var/cache/conftool/dbconfig/20220111-141425-marostegui.json
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18570 and previous config saved to /var/cache/conftool/dbconfig/20220111-141318-marostegui.json
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 14:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18569 and previous config saved to /var/cache/conftool/dbconfig/20220111-141249-marostegui.json
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18568 and previous config saved to /var/cache/conftool/dbconfig/20220111-135744-marostegui.json
* 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bullseye
* 13:43 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18567 and previous config saved to /var/cache/conftool/dbconfig/20220111-134239-marostegui.json
* 13:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 13:36 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 13:33 moritzm: installing 4.9.290 kernels von stretch systems (no reboots yet)
* 13:29 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18565 and previous config saved to /var/cache/conftool/dbconfig/20220111-132734-marostegui.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18564 and previous config saved to /var/cache/conftool/dbconfig/20220111-132627-marostegui.json
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people1003.eqiad.wmnet
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people1003.eqiad.wmnet
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet1002.eqiad.wmnet
* 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet1002.eqiad.wmnet
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18563 and previous config saved to /var/cache/conftool/dbconfig/20220111-122143-marostegui.json
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:15 cparle@deploy1002: Synchronized wmf-config: Config: [[gerrit:752599{{!}}Enable support for references (T230315)]] (duration: 01m 00s)
* 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
* 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
* 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18562 and previous config saved to /var/cache/conftool/dbconfig/20220111-121025-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18561 and previous config saved to /var/cache/conftool/dbconfig/20220111-120638-marostegui.json
* 12:00 moritzm: reverting kubetcd2004.codfw.wmnet back to "plain" storage
* 11:56 moritzm: rebalance ganeti row A (all nodes reimaged to Buster)
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18560 and previous config saved to /var/cache/conftool/dbconfig/20220111-115522-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18559 and previous config saved to /var/cache/conftool/dbconfig/20220111-115133-marostegui.json
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18558 and previous config saved to /var/cache/conftool/dbconfig/20220111-114018-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18557 and previous config saved to /var/cache/conftool/dbconfig/20220111-113628-marostegui.json
* 11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18556 and previous config saved to /var/cache/conftool/dbconfig/20220111-113216-marostegui.json
* 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18555 and previous config saved to /var/cache/conftool/dbconfig/20220111-113208-marostegui.json
* 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18554 and previous config saved to /var/cache/conftool/dbconfig/20220111-112514-root.json
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18553 and previous config saved to /var/cache/conftool/dbconfig/20220111-111704-marostegui.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18551 and previous config saved to /var/cache/conftool/dbconfig/20220111-110159-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18550 and previous config saved to /var/cache/conftool/dbconfig/20220111-104654-marostegui.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18549 and previous config saved to /var/cache/conftool/dbconfig/20220111-103941-marostegui.json
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18548 and previous config saved to /var/cache/conftool/dbconfig/20220111-103927-marostegui.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18547 and previous config saved to /var/cache/conftool/dbconfig/20220111-102421-marostegui.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18546 and previous config saved to /var/cache/conftool/dbconfig/20220111-100917-marostegui.json
* 09:58 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2019.codfw.wmnet with OS buster
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18545 and previous config saved to /var/cache/conftool/dbconfig/20220111-095408-marostegui.json
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18544 and previous config saved to /var/cache/conftool/dbconfig/20220111-095254-marostegui.json
* 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18543 and previous config saved to /var/cache/conftool/dbconfig/20220111-095246-marostegui.json
* 09:51 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
* 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster1001.eqiad.wmnet
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18542 and previous config saved to /var/cache/conftool/dbconfig/20220111-093741-marostegui.json
* 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
* 09:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster1001.eqiad.wmnet
* 09:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
* 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18541 and previous config saved to /var/cache/conftool/dbconfig/20220111-092706-ladsgroup.json
* 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2019.codfw.wmnet with OS buster
* 09:23 ema: cp4021 (upload), cp4027 (text): upgrade varnish to 6.0.9-1wm1 [[phab:T298758|T298758]]
* 09:23 hashar: Upgrading Jenkins and Apache on releases1002 & release2002
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18540 and previous config saved to /var/cache/conftool/dbconfig/20220111-092236-marostegui.json
* 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2078.codfw.wmnet with OS bullseye
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
* 09:13 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18539 and previous config saved to /var/cache/conftool/dbconfig/20220111-091201-ladsgroup.json
* 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS buster
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18538 and previous config saved to /var/cache/conftool/dbconfig/20220111-090732-marostegui.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18537 and previous config saved to /var/cache/conftool/dbconfig/20220111-090119-marostegui.json
* 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18536 and previous config saved to /var/cache/conftool/dbconfig/20220111-090111-marostegui.json
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18535 and previous config saved to /var/cache/conftool/dbconfig/20220111-085656-ladsgroup.json
* 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2078.codfw.wmnet with OS bullseye
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18534 and previous config saved to /var/cache/conftool/dbconfig/20220111-084606-marostegui.json
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18533 and previous config saved to /var/cache/conftool/dbconfig/20220111-084151-ladsgroup.json
* 08:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS buster
* 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2124.codfw.wmnet
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2124.codfw.wmnet
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18532 and previous config saved to /var/cache/conftool/dbconfig/20220111-083322-ladsgroup.json
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18531 and previous config saved to /var/cache/conftool/dbconfig/20220111-083314-ladsgroup.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18530 and previous config saved to /var/cache/conftool/dbconfig/20220111-083102-marostegui.json
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bullseye
* 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18529 and previous config saved to /var/cache/conftool/dbconfig/20220111-081809-ladsgroup.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18528 and previous config saved to /var/cache/conftool/dbconfig/20220111-081557-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18527 and previous config saved to /var/cache/conftool/dbconfig/20220111-081442-marostegui.json
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18526 and previous config saved to /var/cache/conftool/dbconfig/20220111-081400-marostegui.json
* 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18525 and previous config saved to /var/cache/conftool/dbconfig/20220111-080305-ladsgroup.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18524 and previous config saved to /var/cache/conftool/dbconfig/20220111-075856-marostegui.json
* 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bullseye
* 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18523 and previous config saved to /var/cache/conftool/dbconfig/20220111-074800-ladsgroup.json
* 07:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2117.codfw.wmnet
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18522 and previous config saved to /var/cache/conftool/dbconfig/20220111-074351-marostegui.json
* 07:42 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2117.codfw.wmnet
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18521 and previous config saved to /var/cache/conftool/dbconfig/20220111-074202-ladsgroup.json
* 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18520 and previous config saved to /var/cache/conftool/dbconfig/20220111-074154-ladsgroup.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18519 and previous config saved to /var/cache/conftool/dbconfig/20220111-072847-marostegui.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18518 and previous config saved to /var/cache/conftool/dbconfig/20220111-072649-ladsgroup.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18517 and previous config saved to /var/cache/conftool/dbconfig/20220111-071729-marostegui.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18516 and previous config saved to /var/cache/conftool/dbconfig/20220111-071721-marostegui.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18515 and previous config saved to /var/cache/conftool/dbconfig/20220111-071254-root.json
* 07:12 taavi: extensions/CentralAuth/maintenance/migrateHiddenLevel.php finished - [[phab:T289068|T289068]]
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18514 and previous config saved to /var/cache/conftool/dbconfig/20220111-071144-ladsgroup.json
* 07:07 marostegui: Failover m2 proxy from dbproxy1015 to dbproxy1013 [[phab:T298586|T298586]]
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18513 and previous config saved to /var/cache/conftool/dbconfig/20220111-070216-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18512 and previous config saved to /var/cache/conftool/dbconfig/20220111-065750-root.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18511 and previous config saved to /var/cache/conftool/dbconfig/20220111-065640-ladsgroup.json
* 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2114.codfw.wmnet
* 06:51 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2114.codfw.wmnet
* 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P18510 and previous config saved to /var/cache/conftool/dbconfig/20220111-065118-ladsgroup.json
* 06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 06:50 Amir1: upgrading mysql on ['db2114', 'db2117', 'db2124']
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18509 and previous config saved to /var/cache/conftool/dbconfig/20220111-064712-marostegui.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18508 and previous config saved to /var/cache/conftool/dbconfig/20220111-064247-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18507 and previous config saved to /var/cache/conftool/dbconfig/20220111-063207-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18506 and previous config saved to /var/cache/conftool/dbconfig/20220111-063052-marostegui.json
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1012.eqiad.wmnet with OS bullseye
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18505 and previous config saved to /var/cache/conftool/dbconfig/20220111-062743-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2032 after Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18504 and previous config saved to /var/cache/conftool/dbconfig/20220111-062620-marostegui.json
* 06:21 taavi: starting extensions/CentralAuth/maintenance/migrateHiddenLevel.php on a mwmaint1002 screen session - [[phab:T289068|T289068]]
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1012.eqiad.wmnet with OS bullseye
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18503 and previous config saved to /var/cache/conftool/dbconfig/20220111-054417-marostegui.json
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 02:41 eileen: * revision {{Gerrit|d90542c2}} -> {{Gerrit|2956a622}} (latest)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 01:42 eileen: revision {{Gerrit|277989d7}} -> {{Gerrit|d90542c2}} (latest) civicrm
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/skins/Vector/resources/skins.vector.js/dropdownMenus.js: {{Gerrit|79b33f2}}: Fix TypeError: document.querySelectorAll(...).forEach is not a function ([[phab:T298910|T298910]]) (duration: 00m 59s)
* 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn


== 2015-12-18 ==
== 2022-01-10 ==
* 22:57 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki/mediawiki.searchSuggest.js: allow override of suggestion type reported in event loggin (duration: 00m 29s)
* 22:36 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
* 22:56 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/extensions/CirrusSearch/resources/ext.cirrus.suggest.js: override suggestion type reported in event logging (duration: 00m 30s)
* 22:34 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
* 22:50 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.9/includes/jobqueue/aggregator/JobQueueAggregatorRedis.php: 2c942ba1782c42ee68622278a5e0a77e9027945d (duration: 00m 30s)
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18502 and previous config saved to /var/cache/conftool/dbconfig/20220110-202728-marostegui.json
* 22:30 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.9/extensions/CirrusSearch/resources/ext.cirrus.suggest.js: override suggestion type reported in event logging (duration: 00m 30s)
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18501 and previous config saved to /var/cache/conftool/dbconfig/20220110-201224-marostegui.json
* 22:20 logmsgbot: aaron@tin Synchronized php-1.27.0-wmf.9/includes/jobqueue/aggregator/JobQueueAggregator.php: 2c942ba1782c42ee68622278a5e0a77e9027945d (duration: 00m 31s)
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18500 and previous config saved to /var/cache/conftool/dbconfig/20220110-195719-marostegui.json
* 19:26 logmsgbot: aaron@tin Synchronized wmf-config/jobqueue-eqiad.php: Adjust queue "maxPartitionsTry" and timeouts (duration: 00m 30s)
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18499 and previous config saved to /var/cache/conftool/dbconfig/20220110-194214-marostegui.json
* 18:49 mutante: disregard that, apache config only is enough
* 19:32 ejegg: updated fundraising civicrm from {{Gerrit|3d334f30}} to {{Gerrit|277989d7}}
* 18:47 mutante: gerrit will restart in a moment and be right back
* 19:29 urbanecm: UTC evening B&C finished
* 18:44 ori: ditto
* 19:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8f5ca9af5ef04d1d19759cdf201fc0c7e4ee6fbc}}: Enable TheWikipediaLibrary on most wikis ([[phab:T288070|T288070]]) (duration: 01m 00s)
* 18:43 Krinkle: Created account "Krinkle" on collabwiki
* 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 16:28 twentyafterfour: restarted apache on iridium to deploy redirect script changes
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 16:20 jynus: restarting and reconfiguring mysql on db1047
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:57 godog: stop compactions on restbase1008
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:55 jynus: SET GLOBAL query_cache_type = 0; on db1025
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18497 and previous config saved to /var/cache/conftool/dbconfig/20220110-184154-marostegui.json
* 14:54 hashar: gallium: restarted apache2 , was deadlocked/unresponsive somehow
* 18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:44 godog: update privatesettings with swift codfw configuration
* 18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:43 godog: set temp-url-key for mw:media account in swift codfw
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18496 and previous config saved to /var/cache/conftool/dbconfig/20220110-184147-marostegui.json
* 12:19 paravoid: upgrading tor on radium, rebooting for kernel upgrade
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18495 and previous config saved to /var/cache/conftool/dbconfig/20220110-182642-marostegui.json
* 12:18 _joe_: disabled puppet on all lvs hosts for a potentially harmful change (should be a noop)
* 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18494 and previous config saved to /var/cache/conftool/dbconfig/20220110-181137-marostegui.json
* 11:47 _joe_: restarted hhvm on mw1107, stuck at startup
* 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18493 and previous config saved to /var/cache/conftool/dbconfig/20220110-175633-marostegui.json
* 11:40 hashar: logstash: reorganized list of dashboards per sections  https://logstash.wikimedia.org/#/dashboard/elasticsearch/default
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18492 and previous config saved to /var/cache/conftool/dbconfig/20220110-175503-marostegui.json
* 09:43 akosiaris: rebooting planet1001, memory exhaustion, OOM showed up
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:20 hashar: Killed Zuul entirely, the queues were full / deadlocked. Patches need to be retriggered
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 06:47 gwicke: restbase1004: nodetool stop -- COMPACTION to avoid running out of disk space
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18491 and previous config saved to /var/cache/conftool/dbconfig/20220110-175455-marostegui.json
* 03:07 logmsgbot: ori@tin Synchronized php-1.27.0-wmf.9/includes/api/ApiStashEdit.php: ab32f4e740: Make ApiStashEdit use statsd metrics (duration: 00m 49s)
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18489 and previous config saved to /var/cache/conftool/dbconfig/20220110-173950-marostegui.json
* 02:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Dec 18 02:29:10 UTC 2015 (duration 6m 55s)
* 17:34 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1016.eqiad.wmnet
* 02:22 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 45s)
* 17:32 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1016.eqiad.wmnet
* 01:52 ori: re-enabled puppet on rdb* / mc*
* 17:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1015.eqiad.wmnet
* 01:25 ori: in preparation for Iaefb2d191e, disabling puppet on mc* and rdb*
* 17:28 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1015.eqiad.wmnet
* 01:21 logmsgbot: krinkle@tin Synchronized docroot and w: (no message) (duration: 00m 32s)
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18488 and previous config saved to /var/cache/conftool/dbconfig/20220110-172446-marostegui.json
* 00:53 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: Revert Nuke-Flow integration, doesn't work (duration: 00m 32s)
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1006.eqiad.wmnet
* 00:42 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: SWAT: Nuke support for Flow, part 3 (duration: 00m 32s)
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1006.eqiad.wmnet
* 00:34 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add completion suggester to BetaFeatures whitelist (duration: 00m 30s)
* 17:16 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
* 00:26 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: grumble grumble touch InitialiseSettings grumble (duration: 00m 30s)
* 17:14 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
* 00:25 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Flow: SWAT: Nuke support for Flow, part 2 (duration: 00m 32s)
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18487 and previous config saved to /var/cache/conftool/dbconfig/20220110-170941-marostegui.json
* 00:23 logmsgbot: catrope@tin Synchronized wmf-config/CirrusSearch-production.php: SWAT: enable completion suggester beta on all wikis except wikidata (duration: 00m 30s)
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18486 and previous config saved to /var/cache/conftool/dbconfig/20220110-170811-marostegui.json
* 00:23 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: enable completion suggester beta on all wikis except wikidata (duration: 00m 29s)
* 17:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 00:20 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/Nuke/: SWAT: Nuke support in Flow, part 1 (duration: 00m 30s)
* 17:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 00:18 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki.messagePoster/mediawiki.messagePoster.factory.js: SWAT: fix error in messagePoster (duration: 00m 29s)
* 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18485 and previous config saved to /var/cache/conftool/dbconfig/20220110-170804-marostegui.json
* 00:17 logmsgbot: catrope@tin Synchronized php-1.27.0-wmf.9/extensions/MobileFrontend: SWAT: Schema:MobileWebSectionUsage: always log the isTestA field (duration: 00m 31s)
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18484 and previous config saved to /var/cache/conftool/dbconfig/20220110-165259-marostegui.json
* 00:08 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: SWAT: cleanup (duration: 00m 30s)
* 16:52 ema: varnish 6.0.9-1wm1 uploaded to buster-wikimedia - component/varnish6 [[phab:T298758|T298758]]
* 00:00 ori: restarted mathoid on sca1001
* 16:47 moritzm: installing 5.10.84 kernels on bullseye hosts (no reboots involved, just installing the new kernels in parallel)
* 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18483 and previous config saved to /var/cache/conftool/dbconfig/20220110-163754-marostegui.json
* 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18482 and previous config saved to /var/cache/conftool/dbconfig/20220110-162249-marostegui.json
* 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18481 and previous config saved to /var/cache/conftool/dbconfig/20220110-162122-marostegui.json
* 16:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 16:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18480 and previous config saved to /var/cache/conftool/dbconfig/20220110-162114-marostegui.json
* 16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:19 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet
* 16:18 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:13 root@cumin1001: START - Cookbook sre.dns.netbox
* 16:09 damilare: process-control config {{Gerrit|ecf09aa0}} -> {{Gerrit|66e69bda}}
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18479 and previous config saved to /var/cache/conftool/dbconfig/20220110-160608-marostegui.json
* 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum1001.eqiad.wmnet
* 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1003.eqiad.wmnet
* 15:57 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1003.eqiad.wmnet
* 15:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum1001.eqiad.wmnet
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18478 and previous config saved to /var/cache/conftool/dbconfig/20220110-155103-marostegui.json
* 15:49 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
* 15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode1001.eqiad.wmnet
* 15:45 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode1001.eqiad.wmnet
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18476 and previous config saved to /var/cache/conftool/dbconfig/20220110-153559-marostegui.json
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18475 and previous config saved to /var/cache/conftool/dbconfig/20220110-153429-marostegui.json
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18474 and previous config saved to /var/cache/conftool/dbconfig/20220110-153421-marostegui.json
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18472 and previous config saved to /var/cache/conftool/dbconfig/20220110-151917-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18471 and previous config saved to /var/cache/conftool/dbconfig/20220110-150412-marostegui.json
* 14:55 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb1002.eqiad.wmnet
* 14:51 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 14:49 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752277{{!}}Give priority to PreparedUpdate (T288639)]] (duration: 01m 00s)
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18470 and previous config saved to /var/cache/conftool/dbconfig/20220110-144907-marostegui.json
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18469 and previous config saved to /var/cache/conftool/dbconfig/20220110-144737-marostegui.json
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:36 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb1002.eqiad.wmnet
* 14:32 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1001.wikimedia.org
* 14:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1001.wikimedia.org
* 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
* 14:19 jelto: upload wmf-sre-laptop 0.5.3 deb package
* 14:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
* 14:07 jbond: disable puppet fleet wide for puppetdb restart
* 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 13:54 btullis: upgrading oozie packages in reprepro in order to pick up new log4j version
* 13:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2032.codfw.wmnet with OS bullseye
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18468 and previous config saved to /var/cache/conftool/dbconfig/20220110-131523-marostegui.json
* 13:02 moritzm: installing ghostscript security updates
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18467 and previous config saved to /var/cache/conftool/dbconfig/20220110-130018-marostegui.json
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18466 and previous config saved to /var/cache/conftool/dbconfig/20220110-124513-marostegui.json
* 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2032.codfw.wmnet with OS bullseye
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2032 for Bullseye reimage [[phab:T295965|T295965]]', diff saved to https://phabricator.wikimedia.org/P18465 and previous config saved to /var/cache/conftool/dbconfig/20220110-124222-marostegui.json
* 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:36 taavi: UTC morning deploys done
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:34 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752634{{!}}hewikisource: remove "קטע" namespace and its talk page (T298430)]] (duration: 00m 58s)
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18464 and previous config saved to /var/cache/conftool/dbconfig/20220110-123009-marostegui.json
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18463 and previous config saved to /var/cache/conftool/dbconfig/20220110-122847-marostegui.json
* 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18462 and previous config saved to /var/cache/conftool/dbconfig/20220110-122840-marostegui.json
* 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:24 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:752187{{!}}Growth: Add GEMentorDashboardDeploymentMode (T298792)]] (duration: 00m 59s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:751545{{!}}uzwiki: Amend Babel configuration (T131924)]] (duration: 00m 59s)
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18460 and previous config saved to /var/cache/conftool/dbconfig/20220110-121335-marostegui.json
* 12:10 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747868{{!}}Add MediaSearch profiles (T297863)]] (duration: 00m 59s)
* 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18459 and previous config saved to /var/cache/conftool/dbconfig/20220110-115830-marostegui.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18458 and previous config saved to /var/cache/conftool/dbconfig/20220110-114326-marostegui.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T297191|T297191]])', diff saved to https://phabricator.wikimedia.org/P18457 and previous config saved to /var/cache/conftool/dbconfig/20220110-114305-marostegui.json
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance