You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye)
imported>Stashbot
(ryankemper: T294805 Re-enabling puppet across eqiad elastic fleet: `ryankemper@cumin1001:~$ sudo cumin -b 8 'elastic1*' 'sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent'` tmux session `elastic`)
Line 1: Line 1:
== 2022-02-08 ==
* 00:12 ryankemper: [[phab:T294805|T294805]] Re-enabling puppet across eqiad elastic fleet: `ryankemper@cumin1001:~$ sudo cumin -b 8 'elastic1*' 'sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - [[phab:T294805|T294805]] - root" && sudo run-puppet-agent'` tmux session `elastic`
* 00:12 ryankemper: [[phab:T294805|T294805]] old psi masters are out, done with all elastic master operations
* 00:05 ryankemper: [[phab:T294805|T294805]] new psi masters `elastic1073`, `elastic1075`, and `elastic1083` are in
== 2022-02-07 ==
* 23:39 ryankemper: [[phab:T294805|T294805]] Removed old masters `elastic1034` and `elastic1038` (and `elastic1040` was removed earlier)
* 23:35 ryankemper: [[phab:T294805|T294805]] Bringing in new omega master `elastic1057`
* 23:31 ryankemper: [[phab:T294805|T294805]] Bringing in new omega master `elastic1076`
* 23:27 ryankemper: [[phab:T294805|T294805]] Bringing in new master `elastic1068`
* 23:27 ryankemper: [[phab:T294805|T294805]] Main search cluster all done, proceeding to `omega` cluster
* 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
* 23:17 cwhite: end opensearch upgrade (eqiad) [[phab:T299168|T299168]]
* 23:09 ryankemper: [[phab:T294805|T294805]] Kicking out the final master `elastic1036` (which is also the currently elected leader); after this we'll be back to 3 masters as intended
* 23:06 ryankemper: [[phab:T294805|T294805]] Running puppet and restarting elasticsearch services on `elastic1040` to make it no longer a master
* 23:04 ryankemper: [[phab:T294805|T294805]] Bringing in new master `elastic1081`: `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-psi-eqiad.service`
* 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
* 23:04 ryankemper: [[phab:T294805|T294805]] Bringing in new master `elastic1081`: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - [[phab:T294805|T294805]] - root" && sudo run-puppet-agent`
* 22:59 ryankemper: [[phab:T294805|T294805]] `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-omega-eqiad.service` on `elastic1074`
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
* 22:57 ryankemper: [[phab:T294805|T294805]] Running puppet agent on new master elastic1074.eqiad.wmnet: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - [[phab:T294805|T294805]] - root" && sudo run-puppet-agent`
* 22:48 ryankemper: [[phab:T294805|T294805]] Disabled puppet across all of elastic1* in preparation for bringing new master hosts in
* 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20235 and previous config saved to /var/cache/conftool/dbconfig/20220207-224733-ladsgroup.json
* 22:45 inflatador: [[phab:T294805|T294805]] puppet-merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/736118
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
* 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
* 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20234 and previous config saved to /var/cache/conftool/dbconfig/20220207-223228-ladsgroup.json
* 22:25 cwhite: begin opensearch upgrade (eqiad) [[phab:T299168|T299168]]
* 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
* 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20233 and previous config saved to /var/cache/conftool/dbconfig/20220207-221723-ladsgroup.json
* 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
* 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T300510|T300510]])', diff saved to https://phabricator.wikimedia.org/P20232 and previous config saved to /var/cache/conftool/dbconfig/20220207-221345-ladsgroup.json
* 22:11 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
* 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20231 and previous config saved to /var/cache/conftool/dbconfig/20220207-220218-ladsgroup.json
* 22:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
* 22:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
* 22:00 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20230 and previous config saved to /var/cache/conftool/dbconfig/20220207-215840-ladsgroup.json
* 21:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20229 and previous config saved to /var/cache/conftool/dbconfig/20220207-214335-ladsgroup.json
* 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20228 and previous config saved to /var/cache/conftool/dbconfig/20220207-213650-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T300510|T300510]])', diff saved to https://phabricator.wikimedia.org/P20227 and previous config saved to /var/cache/conftool/dbconfig/20220207-212830-ladsgroup.json
* 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
* 21:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
* 21:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 21:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 21:09 otto@deploy1002: Finished deploy [airflow-dags/analytics-test@6d936db]: (no justification provided) (duration: 00m 08s)
* 21:09 otto@deploy1002: Started deploy [airflow-dags/analytics-test@6d936db]: (no justification provided)
* 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
* 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1129.eqiad.wmnet with OS bullseye
* 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20225 and previous config saved to /var/cache/conftool/dbconfig/20220207-205620-ladsgroup.json
* 20:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
* 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20223 and previous config saved to /var/cache/conftool/dbconfig/20220207-204115-ladsgroup.json
* 20:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
* 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1129.eqiad.wmnet with OS bullseye
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T300510|T300510]])', diff saved to https://phabricator.wikimedia.org/P20222 and previous config saved to /var/cache/conftool/dbconfig/20220207-203120-ladsgroup.json
* 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 20:30 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided) (duration: 00m 08s)
* 20:30 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided)
* 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20221 and previous config saved to /var/cache/conftool/dbconfig/20220207-202611-ladsgroup.json
* 20:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
* 20:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
* 20:19 eileen: revision {{Gerrit|7dcdc017}} -> {{Gerrit|ccd5afc3}} civicrm update
* 20:19 eileen: revision {{Gerrit|7dcdc017}} -> {{Gerrit|ccd5afc3}}
* 20:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided) (duration: 00m 07s)
* 20:18 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided)
* 20:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
* 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20220 and previous config saved to /var/cache/conftool/dbconfig/20220207-201106-ladsgroup.json
* 20:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync on main
* 20:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply on main
* 20:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync on main
* 19:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
* 19:55 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
* 19:44 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided) (duration: 00m 08s)
* 19:44 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided)
* 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20219 and previous config saved to /var/cache/conftool/dbconfig/20220207-194020-ladsgroup.json
* 19:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20218 and previous config saved to /var/cache/conftool/dbconfig/20220207-194013-ladsgroup.json
* 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20217 and previous config saved to /var/cache/conftool/dbconfig/20220207-192508-ladsgroup.json
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
* 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20216 and previous config saved to /var/cache/conftool/dbconfig/20220207-191003-ladsgroup.json
* 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 19:05 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:758890{{!}}Turn on wgVectorLanguageAlertInSidebar for all wikis (T300559)]] (duration: 00m 49s)
* 19:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20215 and previous config saved to /var/cache/conftool/dbconfig/20220207-185459-ladsgroup.json
* 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20214 and previous config saved to /var/cache/conftool/dbconfig/20220207-183059-ladsgroup.json
* 18:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 18:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS buster
* 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20213 and previous config saved to /var/cache/conftool/dbconfig/20220207-180857-ladsgroup.json
* 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
* 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
* 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
* 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
* 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.wmnet
* 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2019.wmnet
* 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20212 and previous config saved to /var/cache/conftool/dbconfig/20220207-175352-ladsgroup.json
* 17:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS buster
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
* 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20211 and previous config saved to /var/cache/conftool/dbconfig/20220207-173848-ladsgroup.json
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS buster
* 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20210 and previous config saved to /var/cache/conftool/dbconfig/20220207-172343-ladsgroup.json
* 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20209 and previous config saved to /var/cache/conftool/dbconfig/20220207-165952-ladsgroup.json
* 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20208 and previous config saved to /var/cache/conftool/dbconfig/20220207-165944-ladsgroup.json
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS buster
* 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS buster
* 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20207 and previous config saved to /var/cache/conftool/dbconfig/20220207-164439-ladsgroup.json
* 16:41 moritzm: switch kubestagetcd2003 to plain disk storage
* 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
* 16:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
* 16:30 moritzm: switch kubestagetcd2002 to plain disk storage
* 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20206 and previous config saved to /var/cache/conftool/dbconfig/20220207-162935-ladsgroup.json
* 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
* 16:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
* 16:24 moritzm: switch kubestagetcd2001 to plain disk storage
* 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
* 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS buster
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20205 and previous config saved to /var/cache/conftool/dbconfig/20220207-161430-ladsgroup.json
* 16:05 moritzm: migrating instances off ganeti1021
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20204 and previous config saved to /var/cache/conftool/dbconfig/20220207-160441-ladsgroup.json
* 16:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20203 and previous config saved to /var/cache/conftool/dbconfig/20220207-160433-ladsgroup.json
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20201 and previous config saved to /var/cache/conftool/dbconfig/20220207-154928-ladsgroup.json
* 15:47 moritzm: installing pillow security updates
* 15:44 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 02m 30s)
* 15:41 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
* 15:40 jayme: updated scap to 4.3.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary, A:restbase-canary - [[phab:T300804|T300804]]
* 15:37 jayme: uploaded scap 4.3-0 to apt.w.o - [[phab:T300804|T300804]]
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20200 and previous config saved to /var/cache/conftool/dbconfig/20220207-153424-ladsgroup.json
* 15:30 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20199 and previous config saved to /var/cache/conftool/dbconfig/20220207-151917-ladsgroup.json
* 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20198 and previous config saved to /var/cache/conftool/dbconfig/20220207-151018-ladsgroup.json
* 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20197 and previous config saved to /var/cache/conftool/dbconfig/20220207-150959-ladsgroup.json
* 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20196 and previous config saved to /var/cache/conftool/dbconfig/20220207-145454-ladsgroup.json
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20195 and previous config saved to /var/cache/conftool/dbconfig/20220207-143950-ladsgroup.json
* 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20194 and previous config saved to /var/cache/conftool/dbconfig/20220207-142445-ladsgroup.json
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298554|T298554]])', diff saved to https://phabricator.wikimedia.org/P20193 and previous config saved to /var/cache/conftool/dbconfig/20220207-141452-ladsgroup.json
* 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 13:14 jbond: update ferm on bullseye
* 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1020.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
* 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
* 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
* 12:44 moritzm: installing ruby2.7 security updates
* 12:40 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
* 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:34 moritzm: revert kubestagetcd1006 to plain disk storage
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:32 taavi: UTC morning deploys done
* 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
* 12:32 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:760202{{!}}Ensure GlobalBlocking is not loaded without CentralAuth (T299371)]] (2/2) (duration: 00m 48s)
* 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
* 12:31 moritzm: revert kubestagetcd1005 to plain disk storage
* 12:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:760202{{!}}Ensure GlobalBlocking is not loaded without CentralAuth (T299371)]] (1/2) (duration: 00m 48s)
* 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:27 taavi@deploy1002: Synchronized w/robots.php: Config: [[gerrit:759300{{!}}Migrate $wmfRealm calls to $wmgRealm (T45956)]] (3/3) (duration: 00m 48s)
* 12:26 taavi@deploy1002: Synchronized wmf-config: Config: [[gerrit:759300{{!}}Migrate $wmfRealm calls to $wmgRealm (T45956)]] (2/3) (duration: 00m 48s)
* 12:25 taavi@deploy1002: Synchronized multiversion: Config: [[gerrit:759300{{!}}Migrate $wmfRealm calls to $wmgRealm (T45956)]] (1/3) (duration: 00m 48s)
* 12:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
* 12:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
* 12:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:759564{{!}}Remove redundant patrolmarks flag from patroller usergroup (T300913)]] (duration: 00m 48s)
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:17 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1009.eqiad.wmnet
* 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
* 12:09 taavi: taavi@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756014{{!}}Stop capturing media change tags (T286362)]] (2/2) (duration: 00m 50s)
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
* 12:08 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:756014{{!}}Stop capturing media change tags (T286362)]] (1/2) (duration: 00m 50s)
* 12:07 moritzm: revert kubestagetcd1004 to plain disk storage
* 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
* 12:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
* 11:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1008.eqiad.wmnet
* 11:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1007.eqiad.wmnet
* 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
* 11:18 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
* 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
* 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
* 11:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
* 11:00 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1006.eqiad.wmnet
* 10:51 mmandere: rolling upgrade of varnish from version 6.0.9 to 6.0.10 across DCs [[phab:T300264|T300264]]
* 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
* 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
* 10:22 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1005.eqiad.wmnet
* 09:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1004.eqiad.wmnet
* 09:21 godog: temp-disable mfa for 'filippo' - [[phab:T296629|T296629]]
* 09:09 jayme: uncordoned kubernetes1014 - [[phab:T301099|T301099]]
* 08:02 jayme: powercycle kubernetes1014 - [[phab:T301099|T301099]]
* 06:20 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
* 06:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
* 06:10 jayme: draining kubernetes1014
== 2022-02-05 ==
== 2022-02-05 ==
* 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
* 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye

Revision as of 00:12, 8 February 2022

2022-02-08

  • 00:12 ryankemper: T294805 Re-enabling puppet across eqiad elastic fleet: `ryankemper@cumin1001:~$ sudo cumin -b 8 'elastic1*' 'sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent'` tmux session `elastic`
  • 00:12 ryankemper: T294805 old psi masters are out, done with all elastic master operations
  • 00:05 ryankemper: T294805 new psi masters `elastic1073`, `elastic1075`, and `elastic1083` are in

2022-02-07

  • 23:39 ryankemper: T294805 Removed old masters `elastic1034` and `elastic1038` (and `elastic1040` was removed earlier)
  • 23:35 ryankemper: T294805 Bringing in new omega master `elastic1057`
  • 23:31 ryankemper: T294805 Bringing in new omega master `elastic1076`
  • 23:27 ryankemper: T294805 Bringing in new master `elastic1068`
  • 23:27 ryankemper: T294805 Main search cluster all done, proceeding to `omega` cluster
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:17 cwhite: end opensearch upgrade (eqiad) T299168
  • 23:09 ryankemper: T294805 Kicking out the final master `elastic1036` (which is also the currently elected leader); after this we'll be back to 3 masters as intended
  • 23:06 ryankemper: T294805 Running puppet and restarting elasticsearch services on `elastic1040` to make it no longer a master
  • 23:04 ryankemper: T294805 Bringing in new master `elastic1081`: `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-psi-eqiad.service`
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:04 ryankemper: T294805 Bringing in new master `elastic1081`: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent`
  • 22:59 ryankemper: T294805 `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-omega-eqiad.service` on `elastic1074`
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:57 ryankemper: T294805 Running puppet agent on new master elastic1074.eqiad.wmnet: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent`
  • 22:48 ryankemper: T294805 Disabled puppet across all of elastic1* in preparation for bringing new master hosts in
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20235 and previous config saved to /var/cache/conftool/dbconfig/20220207-224733-ladsgroup.json
  • 22:45 inflatador: T294805 puppet-merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/736118
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20234 and previous config saved to /var/cache/conftool/dbconfig/20220207-223228-ladsgroup.json
  • 22:25 cwhite: begin opensearch upgrade (eqiad) T299168
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20233 and previous config saved to /var/cache/conftool/dbconfig/20220207-221723-ladsgroup.json
  • 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20232 and previous config saved to /var/cache/conftool/dbconfig/20220207-221345-ladsgroup.json
  • 22:11 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20231 and previous config saved to /var/cache/conftool/dbconfig/20220207-220218-ladsgroup.json
  • 22:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:00 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20230 and previous config saved to /var/cache/conftool/dbconfig/20220207-215840-ladsgroup.json
  • 21:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20229 and previous config saved to /var/cache/conftool/dbconfig/20220207-214335-ladsgroup.json
  • 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20228 and previous config saved to /var/cache/conftool/dbconfig/20220207-213650-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20227 and previous config saved to /var/cache/conftool/dbconfig/20220207-212830-ladsgroup.json
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:09 otto@deploy1002: Finished deploy [airflow-dags/analytics-test@6d936db]: (no justification provided) (duration: 00m 08s)
  • 21:09 otto@deploy1002: Started deploy [airflow-dags/analytics-test@6d936db]: (no justification provided)
  • 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1129.eqiad.wmnet with OS bullseye
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20225 and previous config saved to /var/cache/conftool/dbconfig/20220207-205620-ladsgroup.json
  • 20:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20223 and previous config saved to /var/cache/conftool/dbconfig/20220207-204115-ladsgroup.json
  • 20:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1129.eqiad.wmnet with OS bullseye
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20222 and previous config saved to /var/cache/conftool/dbconfig/20220207-203120-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:30 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided) (duration: 00m 08s)
  • 20:30 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided)
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20221 and previous config saved to /var/cache/conftool/dbconfig/20220207-202611-ladsgroup.json
  • 20:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
  • 20:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
  • 20:19 eileen: revision 7dcdc017 -> ccd5afc3 civicrm update
  • 20:19 eileen: revision 7dcdc017 -> ccd5afc3
  • 20:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided) (duration: 00m 07s)
  • 20:18 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided)
  • 20:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20220 and previous config saved to /var/cache/conftool/dbconfig/20220207-201106-ladsgroup.json
  • 20:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 20:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 20:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 19:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 19:44 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided) (duration: 00m 08s)
  • 19:44 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided)
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20219 and previous config saved to /var/cache/conftool/dbconfig/20220207-194020-ladsgroup.json
  • 19:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20218 and previous config saved to /var/cache/conftool/dbconfig/20220207-194013-ladsgroup.json
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20217 and previous config saved to /var/cache/conftool/dbconfig/20220207-192508-ladsgroup.json
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20216 and previous config saved to /var/cache/conftool/dbconfig/20220207-191003-ladsgroup.json
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:05 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Turn on wgVectorLanguageAlertInSidebar for all wikis (T300559) (duration: 00m 49s)
  • 19:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20215 and previous config saved to /var/cache/conftool/dbconfig/20220207-185459-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20214 and previous config saved to /var/cache/conftool/dbconfig/20220207-183059-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS buster
  • 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20213 and previous config saved to /var/cache/conftool/dbconfig/20220207-180857-ladsgroup.json
  • 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.wmnet
  • 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2019.wmnet
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20212 and previous config saved to /var/cache/conftool/dbconfig/20220207-175352-ladsgroup.json
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS buster
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20211 and previous config saved to /var/cache/conftool/dbconfig/20220207-173848-ladsgroup.json
  • 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS buster
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20210 and previous config saved to /var/cache/conftool/dbconfig/20220207-172343-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20209 and previous config saved to /var/cache/conftool/dbconfig/20220207-165952-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20208 and previous config saved to /var/cache/conftool/dbconfig/20220207-165944-ladsgroup.json
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS buster
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS buster
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20207 and previous config saved to /var/cache/conftool/dbconfig/20220207-164439-ladsgroup.json
  • 16:41 moritzm: switch kubestagetcd2003 to plain disk storage
  • 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
  • 16:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
  • 16:30 moritzm: switch kubestagetcd2002 to plain disk storage
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20206 and previous config saved to /var/cache/conftool/dbconfig/20220207-162935-ladsgroup.json
  • 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
  • 16:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
  • 16:24 moritzm: switch kubestagetcd2001 to plain disk storage
  • 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
  • 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS buster
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20205 and previous config saved to /var/cache/conftool/dbconfig/20220207-161430-ladsgroup.json
  • 16:05 moritzm: migrating instances off ganeti1021
  • 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20204 and previous config saved to /var/cache/conftool/dbconfig/20220207-160441-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20203 and previous config saved to /var/cache/conftool/dbconfig/20220207-160433-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20201 and previous config saved to /var/cache/conftool/dbconfig/20220207-154928-ladsgroup.json
  • 15:47 moritzm: installing pillow security updates
  • 15:44 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 02m 30s)
  • 15:41 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:40 jayme: updated scap to 4.3.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary, A:restbase-canary - T300804
  • 15:37 jayme: uploaded scap 4.3-0 to apt.w.o - T300804
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20200 and previous config saved to /var/cache/conftool/dbconfig/20220207-153424-ladsgroup.json
  • 15:30 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20199 and previous config saved to /var/cache/conftool/dbconfig/20220207-151917-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20198 and previous config saved to /var/cache/conftool/dbconfig/20220207-151018-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20197 and previous config saved to /var/cache/conftool/dbconfig/20220207-150959-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20196 and previous config saved to /var/cache/conftool/dbconfig/20220207-145454-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20195 and previous config saved to /var/cache/conftool/dbconfig/20220207-143950-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20194 and previous config saved to /var/cache/conftool/dbconfig/20220207-142445-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20193 and previous config saved to /var/cache/conftool/dbconfig/20220207-141452-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:14 jbond: update ferm on bullseye
  • 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1020.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 12:44 moritzm: installing ruby2.7 security updates
  • 12:40 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:34 moritzm: revert kubestagetcd1006 to plain disk storage
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:32 taavi: UTC morning deploys done
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:32 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Ensure GlobalBlocking is not loaded without CentralAuth (T299371) (2/2) (duration: 00m 48s)
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:31 moritzm: revert kubestagetcd1005 to plain disk storage
  • 12:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Ensure GlobalBlocking is not loaded without CentralAuth (T299371) (1/2) (duration: 00m 48s)
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:27 taavi@deploy1002: Synchronized w/robots.php: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (3/3) (duration: 00m 48s)
  • 12:26 taavi@deploy1002: Synchronized wmf-config: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (2/3) (duration: 00m 48s)
  • 12:25 taavi@deploy1002: Synchronized multiversion: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (1/3) (duration: 00m 48s)
  • 12:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove redundant patrolmarks flag from patroller usergroup (T300913) (duration: 00m 48s)
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:17 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1009.eqiad.wmnet
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:09 taavi: taavi@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: Stop capturing media change tags (T286362) (2/2) (duration: 00m 50s)
  • 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:08 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop capturing media change tags (T286362) (1/2) (duration: 00m 50s)
  • 12:07 moritzm: revert kubestagetcd1004 to plain disk storage
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
  • 11:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1008.eqiad.wmnet
  • 11:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1007.eqiad.wmnet
  • 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 11:18 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:00 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1006.eqiad.wmnet
  • 10:51 mmandere: rolling upgrade of varnish from version 6.0.9 to 6.0.10 across DCs T300264
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
  • 10:22 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1005.eqiad.wmnet
  • 09:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1004.eqiad.wmnet
  • 09:21 godog: temp-disable mfa for 'filippo' - T296629
  • 09:09 jayme: uncordoned kubernetes1014 - T301099
  • 08:02 jayme: powercycle kubernetes1014 - T301099
  • 06:20 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
  • 06:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
  • 06:10 jayme: draining kubernetes1014

2022-02-05

  • 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 21:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 20:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 19:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 18:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 17:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 16:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 06:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 06:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 05:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye

2022-02-04

  • 23:43 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 23:43 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 23:02 inflatador: bking@deployment-puppetmaster04 local commit to public/private repo, see T299797 for more details
  • 22:37 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 22:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 19:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2002-dev.wikimedia.org with OS bullseye
  • 18:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2002-dev.wikimedia.org with OS bullseye
  • 17:00 arturo: add mcrouter 2022.01.31.00-1 to bullseye-wikimedia (T300578)
  • 16:48 jbond: update add new ferm package ferm_2.5.1-1+wmf11u2
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 elukey: unmask prometheus-mysqld-exporter.service and clean up the old @analytics + wmf_auto_restart units (service+timer) not used anymore on an-coord100[12]
  • 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:18 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1020.eqiad.wmnet with OS buster
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20174 and previous config saved to /var/cache/conftool/dbconfig/20220204-114117-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20173 and previous config saved to /var/cache/conftool/dbconfig/20220204-112613-root.json
  • 11:14 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1020.eqiad.wmnet with OS buster
  • 11:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20172 and previous config saved to /var/cache/conftool/dbconfig/20220204-111110-root.json
  • 11:07 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s1 codfw T263127', diff saved to https://phabricator.wikimedia.org/P20171 and previous config saved to /var/cache/conftool/dbconfig/20220204-110427-marostegui.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20170 and previous config saved to /var/cache/conftool/dbconfig/20220204-105606-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20165 and previous config saved to /var/cache/conftool/dbconfig/20220204-104102-root.json
  • 10:40 moritzm: rebalancing row A in ganeti/eqiad, all nodes of that row are now running Buster T296721
  • 10:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1008.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1008.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s4 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20164 and previous config saved to /var/cache/conftool/dbconfig/20220204-082010-marostegui.json
  • 07:18 elukey: `git checkout main.html` on miscweb1002:/srv/org/wikidata/query to avoid puppet corrective actions (and the host being listed in alarms)
  • 07:09 elukey: cleanup wmf_auto_restart_prometheus-mysqld-exporter@analytics-meta on an-test-coord1001 and unmasked wmf_auto_restart_prometheus-mysqld-exporter (now used)
  • 07:03 elukey: clean up wmf_auto_restart_prometheus-mysqld-exporter@matomo on matomo1002 (not used anymore, listed as failed)
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 schema change', diff saved to https://phabricator.wikimedia.org/P20163 and previous config saved to /var/cache/conftool/dbconfig/20220204-070003-marostegui.json
  • 06:00 legoktm: uploaded pygments 2.11.2 to apt.wm.o (T298399)
  • 02:48 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2035.codfw.wmnet
  • 02:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic2035.codfw.wmnet
  • 02:41 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2035.codfw.wmnet
  • 01:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:04 brennen: for-real end of utc late backport & config window
  • 01:04 brennen@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/Thanks/modules/ext.thanks.flowthank.js: Backport: Correct attribute for flow thanks (T300831) (duration: 00m 49s)
  • 00:50 brennen: reopening utc late backport window for Correct attribute for flow thanks (T300831)
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:12 cjming: end of UTC late backport & config window
  • 00:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update icons, wordmark for test wikis (T299512) (duration: 00m 49s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:10 cjming@deploy1002: Synchronized static/images/mobile/copyright/: Config: Update icons, wordmark for test wikis (T299512) (duration: 00m 53s)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-02-03

  • 23:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20159 and previous config saved to /var/cache/conftool/dbconfig/20220203-233447-marostegui.json
  • 23:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20158 and previous config saved to /var/cache/conftool/dbconfig/20220203-231942-marostegui.json
  • 23:15 ryankemper: T294805 Added a silence on alerts.wikimedia.org for `CirrusSearchJVMGCOldPoolFlatlined`
  • 23:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20157 and previous config saved to /var/cache/conftool/dbconfig/20220203-230437-marostegui.json
  • 22:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20156 and previous config saved to /var/cache/conftool/dbconfig/20220203-224933-marostegui.json
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20155 and previous config saved to /var/cache/conftool/dbconfig/20220203-223923-marostegui.json
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20154 and previous config saved to /var/cache/conftool/dbconfig/20220203-223916-marostegui.json
  • 22:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20153 and previous config saved to /var/cache/conftool/dbconfig/20220203-222411-marostegui.json
  • 22:18 ryankemper: T294805 Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&refresh=1m&from=now-3h&to=now as new hosts join the fleet
  • 22:18 ryankemper: T294805 Bringing in new eqiad hosts in batches of 4, with 15-20 mins between batches: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'elastic1*' 'sudo run-puppet-agent --force; sudo run-puppet-agent; sleep 900'` tmux session `es_eqiad`
  • 22:13 ryankemper: T294805 https://gerrit.wikimedia.org/r/c/operations/puppet/+/759617/ fixed the dependency issues, going to start bringing new hosts into service
  • 22:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20152 and previous config saved to /var/cache/conftool/dbconfig/20220203-220906-marostegui.json
  • 22:05 eileen: civicrm revision 7dcdc017 -> 04cbf35b
  • 22:04 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20150 and previous config saved to /var/cache/conftool/dbconfig/20220203-215402-marostegui.json
  • 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20149 and previous config saved to /var/cache/conftool/dbconfig/20220203-215154-marostegui.json
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20148 and previous config saved to /var/cache/conftool/dbconfig/20220203-215121-marostegui.json
  • 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20147 and previous config saved to /var/cache/conftool/dbconfig/20220203-213616-marostegui.json
  • 21:28 rzl: root@apt1001:/home/rzl# reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy # T300324
  • 21:27 rzl: root@apt1001:/home/rzl# reprepro copy stretch-wikimedia buster-wikimedia envoyproxy # T300324
  • 21:21 ryankemper: T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/759588; hoping this resolves dependency issues. Running puppet agent on `elastic1068`
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20145 and previous config saved to /var/cache/conftool/dbconfig/20220203-212111-marostegui.json
  • 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20144 and previous config saved to /var/cache/conftool/dbconfig/20220203-210607-marostegui.json
  • 21:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20143 and previous config saved to /var/cache/conftool/dbconfig/20220203-210358-marostegui.json
  • 21:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 21:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 21:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20142 and previous config saved to /var/cache/conftool/dbconfig/20220203-210350-marostegui.json
  • 20:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20140 and previous config saved to /var/cache/conftool/dbconfig/20220203-204846-marostegui.json
  • 20:43 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_recount_categories.service # T299823
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20139 and previous config saved to /var/cache/conftool/dbconfig/20220203-203341-marostegui.json
  • 20:26 ryankemper: T294805 Running puppet on `elastic1068` failed, looks like `/usr/share/elasticsearch/lib` wasn't there: https://phabricator.wikimedia.org/P20138
  • 20:26 ryankemper: T294805 Running puppet on `elastic1068` failed, looks like `/usr/share/elasticsearch/lib' wasn't there: https://phabricator.wikimedia.org/P20138
  • 20:25 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: systemd testing
  • 20:25 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: systemd testing
  • 20:22 ryankemper: T294805 Running puppet on single elastic host: `ryankemper@elastic1068:~$ sudo run-puppet-agent --force`
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20137 and previous config saved to /var/cache/conftool/dbconfig/20220203-201836-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20136 and previous config saved to /var/cache/conftool/dbconfig/20220203-201729-marostegui.json
  • 20:17 ryankemper: T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/759317 to activate roles for elastic eqiad replacement hosts
  • 20:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 20:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300402)', diff saved to https://phabricator.wikimedia.org/P20135 and previous config saved to /var/cache/conftool/dbconfig/20220203-201721-marostegui.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:16 ryankemper: T294805 Disabled puppet on `elastic1*` in preparation for bringing new hosts into service: `ryankemper@cumin1001:~$ sudo cumin 'elastic1*' 'sudo disable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805"'`
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 20:11 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.20 refs T293961
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:08 mutante: planet1002/planet2002 - sudo systemctl start planet-update-en to manually start update after adding diff.wikimedia.org T230444
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/includes/Hooks.php: Backport: Drop skin override (T300814) (2/2) (duration: 00m 49s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:06 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/skin.json: Backport: Drop skin override (T300814) (1/2) (duration: 00m 49s)
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20134 and previous config saved to /var/cache/conftool/dbconfig/20220203-200217-marostegui.json
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20133 and previous config saved to /var/cache/conftool/dbconfig/20220203-194712-marostegui.json
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:41 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:39 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:34 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/includes/Hooks.php: Backport: Pass skin name to Hooks::isSkinLegacy (T299971) (duration: 00m 49s)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/ContentTranslation/modules/entrypoints/ext.cx.entrypoints.contributionsmenu.js: Backport: Update skin checks with new vector skin key. (T298916 T300814) (duration: 00m 50s)
  • 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300402)', diff saved to https://phabricator.wikimedia.org/P20132 and previous config saved to /var/cache/conftool/dbconfig/20220203-193208-marostegui.json
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/WikiEditor/modules/ext.wikiEditor.js: Backport: New bucket for abtest data (T291308) (2/2) (duration: 00m 50s)
  • 19:28 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/WikiEditor/includes/Hooks.php: Backport: New bucket for abtest data (T291308) (1/2) (duration: 00m 49s)
  • 19:27 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.trackSubscriber.js: Backport: New bucket for abtest data (T291308) (duration: 00m 50s)
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:26 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add three domains to the wgCopyUploadsDomains allowlist (T299835 T300848) (duration: 00m 54s)
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300402)', diff saved to https://phabricator.wikimedia.org/P20131 and previous config saved to /var/cache/conftool/dbconfig/20220203-183648-marostegui.json
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300402)', diff saved to https://phabricator.wikimedia.org/P20130 and previous config saved to /var/cache/conftool/dbconfig/20220203-183634-marostegui.json
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P20129 and previous config saved to /var/cache/conftool/dbconfig/20220203-182129-marostegui.json
  • 18:17 dancy: restarted php7.2-fpm processes on mediawiki12
  • 18:10 dancy: killed 8 spinning php7.2-fpm processes on mediawiki12
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P20128 and previous config saved to /var/cache/conftool/dbconfig/20220203-180624-marostegui.json
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300402)', diff saved to https://phabricator.wikimedia.org/P20127 and previous config saved to /var/cache/conftool/dbconfig/20220203-175120-marostegui.json
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300402)', diff saved to https://phabricator.wikimedia.org/P20126 and previous config saved to /var/cache/conftool/dbconfig/20220203-174913-marostegui.json
  • 17:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 17:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300402)', diff saved to https://phabricator.wikimedia.org/P20125 and previous config saved to /var/cache/conftool/dbconfig/20220203-174905-marostegui.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P20122 and previous config saved to /var/cache/conftool/dbconfig/20220203-173400-marostegui.json
  • 17:22 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts restbase2011.codfw.wmnet
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P20120 and previous config saved to /var/cache/conftool/dbconfig/20220203-171856-marostegui.json
  • 17:13 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2011.codfw.wmnet
  • 17:12 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts restbase2011.codfw.wmnet
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300402)', diff saved to https://phabricator.wikimedia.org/P20118 and previous config saved to /var/cache/conftool/dbconfig/20220203-170351-marostegui.json
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300402)', diff saved to https://phabricator.wikimedia.org/P20117 and previous config saved to /var/cache/conftool/dbconfig/20220203-170144-marostegui.json
  • 17:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20116 and previous config saved to /var/cache/conftool/dbconfig/20220203-170136-marostegui.json
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20115 and previous config saved to /var/cache/conftool/dbconfig/20220203-164632-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20114 and previous config saved to /var/cache/conftool/dbconfig/20220203-163127-marostegui.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298558)', diff saved to https://phabricator.wikimedia.org/P20113 and previous config saved to /var/cache/conftool/dbconfig/20220203-162316-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20111 and previous config saved to /var/cache/conftool/dbconfig/20220203-161622-marostegui.json
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20110 and previous config saved to /var/cache/conftool/dbconfig/20220203-161515-marostegui.json
  • 16:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300402)', diff saved to https://phabricator.wikimedia.org/P20109 and previous config saved to /var/cache/conftool/dbconfig/20220203-161508-marostegui.json
  • 16:10 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20108 and previous config saved to /var/cache/conftool/dbconfig/20220203-160811-marostegui.json
  • 16:00 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2011.codfw.wmnet
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20107 and previous config saved to /var/cache/conftool/dbconfig/20220203-160003-marostegui.json
  • 15:55 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts restbase2011.codfw.wmnet
  • 15:55 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2011.codfw.wmnet
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20106 and previous config saved to /var/cache/conftool/dbconfig/20220203-155306-marostegui.json
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20105 and previous config saved to /var/cache/conftool/dbconfig/20220203-154458-marostegui.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298558)', diff saved to https://phabricator.wikimedia.org/P20104 and previous config saved to /var/cache/conftool/dbconfig/20220203-153801-marostegui.json
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298558)', diff saved to https://phabricator.wikimedia.org/P20103 and previous config saved to /var/cache/conftool/dbconfig/20220203-153653-marostegui.json
  • 15:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 15:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20102 and previous config saved to /var/cache/conftool/dbconfig/20220203-153646-marostegui.json
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300402)', diff saved to https://phabricator.wikimedia.org/P20101 and previous config saved to /var/cache/conftool/dbconfig/20220203-152953-marostegui.json
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300402)', diff saved to https://phabricator.wikimedia.org/P20100 and previous config saved to /var/cache/conftool/dbconfig/20220203-152746-marostegui.json
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300402)', diff saved to https://phabricator.wikimedia.org/P20099 and previous config saved to /var/cache/conftool/dbconfig/20220203-152739-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20098 and previous config saved to /var/cache/conftool/dbconfig/20220203-152141-marostegui.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20097 and previous config saved to /var/cache/conftool/dbconfig/20220203-151234-marostegui.json
  • 15:12 moritzm: installing apache security updates on gerrit1001
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20096 and previous config saved to /var/cache/conftool/dbconfig/20220203-150636-marostegui.json
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20095 and previous config saved to /var/cache/conftool/dbconfig/20220203-145729-marostegui.json
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20094 and previous config saved to /var/cache/conftool/dbconfig/20220203-145132-marostegui.json
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20093 and previous config saved to /var/cache/conftool/dbconfig/20220203-145024-marostegui.json
  • 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298558)', diff saved to https://phabricator.wikimedia.org/P20092 and previous config saved to /var/cache/conftool/dbconfig/20220203-145017-marostegui.json
  • 14:44 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300402)', diff saved to https://phabricator.wikimedia.org/P20091 and previous config saved to /var/cache/conftool/dbconfig/20220203-144224-marostegui.json
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300402)', diff saved to https://phabricator.wikimedia.org/P20090 and previous config saved to /var/cache/conftool/dbconfig/20220203-144017-marostegui.json
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300402)', diff saved to https://phabricator.wikimedia.org/P20089 and previous config saved to /var/cache/conftool/dbconfig/20220203-143544-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20088 and previous config saved to /var/cache/conftool/dbconfig/20220203-143512-marostegui.json
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20087 and previous config saved to /var/cache/conftool/dbconfig/20220203-142039-marostegui.json
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20086 and previous config saved to /var/cache/conftool/dbconfig/20220203-142007-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20085 and previous config saved to /var/cache/conftool/dbconfig/20220203-140534-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298558)', diff saved to https://phabricator.wikimedia.org/P20084 and previous config saved to /var/cache/conftool/dbconfig/20220203-140503-marostegui.json
  • 13:53 XioNoX: eqiad: push Capirca generated border-in filters
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300402)', diff saved to https://phabricator.wikimedia.org/P20083 and previous config saved to /var/cache/conftool/dbconfig/20220203-135029-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298558)', diff saved to https://phabricator.wikimedia.org/P20082 and previous config saved to /var/cache/conftool/dbconfig/20220203-134952-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298558)', diff saved to https://phabricator.wikimedia.org/P20081 and previous config saved to /var/cache/conftool/dbconfig/20220203-134944-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300402)', diff saved to https://phabricator.wikimedia.org/P20080 and previous config saved to /var/cache/conftool/dbconfig/20220203-134746-marostegui.json
  • 13:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300402)', diff saved to https://phabricator.wikimedia.org/P20079 and previous config saved to /var/cache/conftool/dbconfig/20220203-134739-marostegui.json
  • 13:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:40 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 13:35 jbond: disable puppet fleet wide for puppetdb restart
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20078 and previous config saved to /var/cache/conftool/dbconfig/20220203-133439-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20077 and previous config saved to /var/cache/conftool/dbconfig/20220203-133234-marostegui.json
  • 13:28 marostegui: Test T300858
  • 13:28 moritzm: installing apache security updates
  • 13:27 jayme: moved kubernetes staging master,nodes,etcd from wikimedia_cluster "kubernetes" to "kubernetes-staging" - T273866
  • 13:27 XioNoX: esams: push Capirca generated border-in filters
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20076 and previous config saved to /var/cache/conftool/dbconfig/20220203-131935-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20075 and previous config saved to /var/cache/conftool/dbconfig/20220203-131729-marostegui.json
  • 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298558)', diff saved to https://phabricator.wikimedia.org/P20074 and previous config saved to /var/cache/conftool/dbconfig/20220203-130430-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300402)', diff saved to https://phabricator.wikimedia.org/P20073 and previous config saved to /var/cache/conftool/dbconfig/20220203-130224-marostegui.json
  • 12:58 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 12:57 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on external
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300402)', diff saved to https://phabricator.wikimedia.org/P20072 and previous config saved to /var/cache/conftool/dbconfig/20220203-125737-marostegui.json
  • 12:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300402)', diff saved to https://phabricator.wikimedia.org/P20071 and previous config saved to /var/cache/conftool/dbconfig/20220203-125730-marostegui.json
  • 12:53 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 12:53 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on internal
  • 12:53 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on external
  • 12:52 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 12:51 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on external
  • 12:49 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 12:49 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on internal
  • 12:49 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on external
  • 12:49 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:48 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:44 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 12:44 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 12:44 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 12:44 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 12:44 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 12:43 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 12:43 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20069 and previous config saved to /var/cache/conftool/dbconfig/20220203-124225-marostegui.json
  • 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:38 taavi: UTC morning backport window done
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: mniwiktionary: Add localized mobile wordmark (T294709) (2/2) (duration: 00m 49s)
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:32 taavi@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-mni.svg: Config: mniwiktionary: Add localized mobile wordmark (T294709) (1/2) (duration: 00m 50s)
  • 12:29 XioNoX: eqsin: push Capirca generated border-in filters
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20068 and previous config saved to /var/cache/conftool/dbconfig/20220203-122720-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298558)', diff saved to https://phabricator.wikimedia.org/P20067 and previous config saved to /var/cache/conftool/dbconfig/20220203-122612-marostegui.json
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298558)', diff saved to https://phabricator.wikimedia.org/P20066 and previous config saved to /var/cache/conftool/dbconfig/20220203-122529-marostegui.json
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:19 XioNoX: codfw: push Capirca generated border-in filters
  • 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add www.gbols.smns-bw.org to the wgCopyUploadsDomains allowlist (T300842) (duration: 00m 50s)
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300402)', diff saved to https://phabricator.wikimedia.org/P20065 and previous config saved to /var/cache/conftool/dbconfig/20220203-121216-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20064 and previous config saved to /var/cache/conftool/dbconfig/20220203-121024-marostegui.json
  • 12:10 XioNoX: eqord: push Capirca generated border-in filters
  • 12:09 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [WikibaseMediaInfo] Stop normalizing full text scores (T296631) (duration: 00m 52s)
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300402)', diff saved to https://phabricator.wikimedia.org/P20063 and previous config saved to /var/cache/conftool/dbconfig/20220203-120832-marostegui.json
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300402)', diff saved to https://phabricator.wikimedia.org/P20062 and previous config saved to /var/cache/conftool/dbconfig/20220203-120825-marostegui.json
  • 11:57 kart_: Updated cxserver to 2022-02-03-112745-production, this should unbreak Flores MT!
  • 11:57 XioNoX: ulsfo: push Capirca generated border-in filters
  • 11:55 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: sync on production
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20061 and previous config saved to /var/cache/conftool/dbconfig/20220203-115519-marostegui.json
  • 11:53 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply on staging
  • 11:53 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply on production
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20060 and previous config saved to /var/cache/conftool/dbconfig/20220203-115320-marostegui.json
  • 11:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: sync on production
  • 11:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply on staging
  • 11:49 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply on production
  • 11:47 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: sync on staging
  • 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply on production
  • 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply on staging
  • 11:45 moritzm: installing openjdk-11 security updates
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298558)', diff saved to https://phabricator.wikimedia.org/P20059 and previous config saved to /var/cache/conftool/dbconfig/20220203-114015-marostegui.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298558)', diff saved to https://phabricator.wikimedia.org/P20058 and previous config saved to /var/cache/conftool/dbconfig/20220203-113907-marostegui.json
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20057 and previous config saved to /var/cache/conftool/dbconfig/20220203-113859-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20056 and previous config saved to /var/cache/conftool/dbconfig/20220203-113815-marostegui.json
  • 11:36 arturo: reprepro changes @ apt1001 after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/758050
  • 11:33 moritzm: draining ganeti1020 for eventual reimage
  • 11:26 vgutierrez: rolling varnish-fe restart to catch the new listen_depth config value
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20055 and previous config saved to /var/cache/conftool/dbconfig/20220203-112355-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300402)', diff saved to https://phabricator.wikimedia.org/P20054 and previous config saved to /var/cache/conftool/dbconfig/20220203-112311-marostegui.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300402)', diff saved to https://phabricator.wikimedia.org/P20053 and previous config saved to /var/cache/conftool/dbconfig/20220203-111921-marostegui.json
  • 11:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20052 and previous config saved to /var/cache/conftool/dbconfig/20220203-111908-marostegui.json
  • 11:15 topranks: Adding BGP peering to lsw1-f1-eqiad on cr2-eqiad. T299758.
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20051 and previous config saved to /var/cache/conftool/dbconfig/20220203-110850-marostegui.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20050 and previous config saved to /var/cache/conftool/dbconfig/20220203-110403-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20049 and previous config saved to /var/cache/conftool/dbconfig/20220203-105345-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298558)', diff saved to https://phabricator.wikimedia.org/P20048 and previous config saved to /var/cache/conftool/dbconfig/20220203-105238-marostegui.json
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298558)', diff saved to https://phabricator.wikimedia.org/P20047 and previous config saved to /var/cache/conftool/dbconfig/20220203-105230-marostegui.json
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20046 and previous config saved to /var/cache/conftool/dbconfig/20220203-104858-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20045 and previous config saved to /var/cache/conftool/dbconfig/20220203-103725-marostegui.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20044 and previous config saved to /var/cache/conftool/dbconfig/20220203-103354-marostegui.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20043 and previous config saved to /var/cache/conftool/dbconfig/20220203-103008-marostegui.json
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300402)', diff saved to https://phabricator.wikimedia.org/P20042 and previous config saved to /var/cache/conftool/dbconfig/20220203-103001-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20041 and previous config saved to /var/cache/conftool/dbconfig/20220203-102221-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20040 and previous config saved to /var/cache/conftool/dbconfig/20220203-101456-marostegui.json
  • 10:07 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1015.eqiad.wmnet
  • 10:07 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1014.eqiad.wmnet
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298558)', diff saved to https://phabricator.wikimedia.org/P20039 and previous config saved to /var/cache/conftool/dbconfig/20220203-100716-marostegui.json
  • 10:07 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1013.eqiad.wmnet
  • 10:07 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1012.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1010.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1015.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1014.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1013.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1012.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1011.eqiad.wmnet
  • 10:06 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs1010.eqiad.wmnet
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20038 and previous config saved to /var/cache/conftool/dbconfig/20220203-095952-marostegui.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298558)', diff saved to https://phabricator.wikimedia.org/P20037 and previous config saved to /var/cache/conftool/dbconfig/20220203-095907-marostegui.json
  • 09:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298558)', diff saved to https://phabricator.wikimedia.org/P20036 and previous config saved to /var/cache/conftool/dbconfig/20220203-095859-marostegui.json
  • 09:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1183.eqiad.wmnet with OS bullseye
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300402)', diff saved to https://phabricator.wikimedia.org/P20034 and previous config saved to /var/cache/conftool/dbconfig/20220203-094447-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20033 and previous config saved to /var/cache/conftool/dbconfig/20220203-094354-marostegui.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300402)', diff saved to https://phabricator.wikimedia.org/P20032 and previous config saved to /var/cache/conftool/dbconfig/20220203-094107-marostegui.json
  • 09:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20031 and previous config saved to /var/cache/conftool/dbconfig/20220203-094059-marostegui.json
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1183.eqiad.wmnet with OS bullseye
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20030 and previous config saved to /var/cache/conftool/dbconfig/20220203-092850-marostegui.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20029 and previous config saved to /var/cache/conftool/dbconfig/20220203-092554-marostegui.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298558)', diff saved to https://phabricator.wikimedia.org/P20028 and previous config saved to /var/cache/conftool/dbconfig/20220203-091345-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298558)', diff saved to https://phabricator.wikimedia.org/P20027 and previous config saved to /var/cache/conftool/dbconfig/20220203-091237-marostegui.json
  • 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298558)', diff saved to https://phabricator.wikimedia.org/P20026 and previous config saved to /var/cache/conftool/dbconfig/20220203-091224-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20025 and previous config saved to /var/cache/conftool/dbconfig/20220203-091050-marostegui.json
  • 09:00 marostegui: Failover m2 from db1183 to db1159 - T300329
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20024 and previous config saved to /var/cache/conftool/dbconfig/20220203-085720-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20023 and previous config saved to /var/cache/conftool/dbconfig/20220203-085545-marostegui.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300402)', diff saved to https://phabricator.wikimedia.org/P20022 and previous config saved to /var/cache/conftool/dbconfig/20220203-085159-marostegui.json
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300402)', diff saved to https://phabricator.wikimedia.org/P20021 and previous config saved to /var/cache/conftool/dbconfig/20220203-085151-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20020 and previous config saved to /var/cache/conftool/dbconfig/20220203-084215-marostegui.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20019 and previous config saved to /var/cache/conftool/dbconfig/20220203-083647-marostegui.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298558)', diff saved to https://phabricator.wikimedia.org/P20018 and previous config saved to /var/cache/conftool/dbconfig/20220203-082710-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298558)', diff saved to https://phabricator.wikimedia.org/P20017 and previous config saved to /var/cache/conftool/dbconfig/20220203-082302-marostegui.json
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298558)', diff saved to https://phabricator.wikimedia.org/P20016 and previous config saved to /var/cache/conftool/dbconfig/20220203-082249-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20015 and previous config saved to /var/cache/conftool/dbconfig/20220203-082142-marostegui.json
  • 08:10 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 5hours)
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20014 and previous config saved to /var/cache/conftool/dbconfig/20220203-080745-marostegui.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300402)', diff saved to https://phabricator.wikimedia.org/P20013 and previous config saved to /var/cache/conftool/dbconfig/20220203-080637-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300402)', diff saved to https://phabricator.wikimedia.org/P20012 and previous config saved to /var/cache/conftool/dbconfig/20220203-080254-marostegui.json
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300402)', diff saved to https://phabricator.wikimedia.org/P20011 and previous config saved to /var/cache/conftool/dbconfig/20220203-080247-marostegui.json
  • 07:55 _joe_: restarted php-fpm on wtp1029, segfaulting
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20010 and previous config saved to /var/cache/conftool/dbconfig/20220203-075240-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20009 and previous config saved to /var/cache/conftool/dbconfig/20220203-074742-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298558)', diff saved to https://phabricator.wikimedia.org/P20008 and previous config saved to /var/cache/conftool/dbconfig/20220203-073735-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20007 and previous config saved to /var/cache/conftool/dbconfig/20220203-073237-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298558)', diff saved to https://phabricator.wikimedia.org/P20006 and previous config saved to /var/cache/conftool/dbconfig/20220203-073129-marostegui.json
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2078,2133].codfw.wmnet,db[1117,1159,1183].eqiad.wmnet with reason: Switchover m2 T300329
  • 07:23 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2078,2133].codfw.wmnet,db[1117,1159,1183].eqiad.wmnet with reason: Switchover m2 T300329
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300402)', diff saved to https://phabricator.wikimedia.org/P20005 and previous config saved to /var/cache/conftool/dbconfig/20220203-071732-marostegui.json
  • 07:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300402)', diff saved to https://phabricator.wikimedia.org/P20004 and previous config saved to /var/cache/conftool/dbconfig/20220203-071348-marostegui.json
  • 07:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300402)', diff saved to https://phabricator.wikimedia.org/P20003 and previous config saved to /var/cache/conftool/dbconfig/20220203-071141-marostegui.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298558)', diff saved to https://phabricator.wikimedia.org/P20002 and previous config saved to /var/cache/conftool/dbconfig/20220203-071111-marostegui.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20001 and previous config saved to /var/cache/conftool/dbconfig/20220203-065636-marostegui.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20000 and previous config saved to /var/cache/conftool/dbconfig/20220203-065606-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19999 and previous config saved to /var/cache/conftool/dbconfig/20220203-064131-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19998 and previous config saved to /var/cache/conftool/dbconfig/20220203-064101-marostegui.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300402)', diff saved to https://phabricator.wikimedia.org/P19997 and previous config saved to /var/cache/conftool/dbconfig/20220203-062627-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298558)', diff saved to https://phabricator.wikimedia.org/P19996 and previous config saved to /var/cache/conftool/dbconfig/20220203-062556-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300402)', diff saved to https://phabricator.wikimedia.org/P19995 and previous config saved to /var/cache/conftool/dbconfig/20220203-062243-marostegui.json
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298558)', diff saved to https://phabricator.wikimedia.org/P19994 and previous config saved to /var/cache/conftool/dbconfig/20220203-061703-marostegui.json
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 01:12 brennen: UTC late backport window finished
  • 01:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2029.codfw.wmnet with OS buster
  • 01:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:09 brennen@deploy1002: Finished scap: Backports: Changes the labels of the Vector skins (T299927) and Pass skin name to Hooks::isSkinLegacy (T299971) (duration: 24m 48s)
  • 01:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:44 brennen@deploy1002: Started scap: Backports: Changes the labels of the Vector skins (T299927) and Pass skin name to Hooks::isSkinLegacy (T299971)
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS buster

2022-02-02

  • 22:26 mutante: gitlab - introducing parameter to fetch TLS certs either with acmechief or certbot (if in cloud). Boolean $use_acmechief = lookup('profile::gitlab::use_acmechief'), confirmed noop in prod on gitlab1001.wikimedia.org ( T297411)
  • 21:36 ejegg: updated CiviCRM from 2bd5fb5e to 7dcdc017
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:04 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.20 refs T293961 (duration: 00m 49s)
  • 20:03 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.20 refs T293961
  • 19:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:49 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cowikimedia: Allow bureaucrats to remove sysop and bureaucrat flags (T300779) (duration: 00m 50s)
  • 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:42 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: multiversion: Improve error message if wikiversions.php has wrong format (duration: 00m 49s)
  • 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 62b2acb: Migration mode enabled everywhere (T299927) (duration: 00m 49s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:27 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/includes/SkinVector.php: bdc20dd: Fix the opt in URl (T300097) (duration: 00m 49s)
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:24 urbanecm@deploy1002: Synchronized wmf-config/: a48f8bd: Migrate calls of wmf* constants to wmg* constants (T45956) (duration: 00m 51s)
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300402)', diff saved to https://phabricator.wikimedia.org/P19993 and previous config saved to /var/cache/conftool/dbconfig/20220202-191918-marostegui.json
  • 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:14 urbanecm@deploy1002: Synchronized multiversion/buildConfigCache.php: 83f1f6a: Consistently write to $wmgRealm the same value as to $wmfRealm (T45956) (duration: 00m 49s)
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{kywiki,kywiki-1.5x,kywiki-2x}.png (T300241)
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:09 topranks: Running homer to enable interface et-1/0/2 on cr1-eqiad (towards lsw1-e1-eqiad) to test connectivity.
  • 19:09 urbanecm@deploy1002: Synchronized logos/config.yaml: 335cbee: kywiki: update logo (3/3; T300241) (duration: 00m 49s)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:08 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 335cbee: kywiki: update logo (2/3; T300241) (duration: 00m 53s)
  • 19:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 335cbee: kywiki: update logo (1/3; T300241) (duration: 00m 50s)
  • 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19992 and previous config saved to /var/cache/conftool/dbconfig/20220202-190414-marostegui.json
  • 18:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19991 and previous config saved to /var/cache/conftool/dbconfig/20220202-184909-marostegui.json
  • 18:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300402)', diff saved to https://phabricator.wikimedia.org/P19990 and previous config saved to /var/cache/conftool/dbconfig/20220202-183404-marostegui.json
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300402)', diff saved to https://phabricator.wikimedia.org/P19989 and previous config saved to /var/cache/conftool/dbconfig/20220202-183034-marostegui.json
  • 18:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300402)', diff saved to https://phabricator.wikimedia.org/P19988 and previous config saved to /var/cache/conftool/dbconfig/20220202-183027-marostegui.json
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 18:16 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.20/includes/filerepo/file/ForeignAPIFile.php: Backport: Revert "Support audio on filepage in InstantCommons" (T300751) (duration: 00m 51s)
  • 18:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19987 and previous config saved to /var/cache/conftool/dbconfig/20220202-181522-marostegui.json
  • 18:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19986 and previous config saved to /var/cache/conftool/dbconfig/20220202-180018-marostegui.json
  • 17:45 cwhite: end logstash upgrade (codfw) T299168
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300402)', diff saved to https://phabricator.wikimedia.org/P19985 and previous config saved to /var/cache/conftool/dbconfig/20220202-174513-marostegui.json
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300402)', diff saved to https://phabricator.wikimedia.org/P19984 and previous config saved to /var/cache/conftool/dbconfig/20220202-174138-marostegui.json
  • 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300402)', diff saved to https://phabricator.wikimedia.org/P19983 and previous config saved to /var/cache/conftool/dbconfig/20220202-174125-marostegui.json
  • 17:32 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 17:26 cwhite: begin logstash upgrade (codfw) T299168
  • 17:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19982 and previous config saved to /var/cache/conftool/dbconfig/20220202-172620-marostegui.json
  • 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19981 and previous config saved to /var/cache/conftool/dbconfig/20220202-171115-marostegui.json
  • 16:59 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 16:59 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 16:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300402)', diff saved to https://phabricator.wikimedia.org/P19979 and previous config saved to /var/cache/conftool/dbconfig/20220202-165611-marostegui.json
  • 16:47 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2012.codfw.wmnet
  • 16:47 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2011.codfw.wmnet
  • 16:47 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2010.codfw.wmnet
  • 16:46 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2012.codfw.wmnet
  • 16:46 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2011.codfw.wmnet
  • 16:46 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2010.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2012.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2011.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2010.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2012.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2011.codfw.wmnet
  • 16:45 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2010.codfw.wmnet
  • 16:42 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2005.codfw.wmnet
  • 16:42 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2006.codfw.wmnet
  • 16:42 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2007.codfw.wmnet
  • 16:42 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2008.codfw.wmnet
  • 16:41 Emperor: standardising nginx weights for codfw swift proxies to match eqiad ones T300738
  • 16:41 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
  • 16:41 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2009.codfw.wmnet
  • 16:39 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 16:38 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 16:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 16:27 mvernon@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 16:26 mvernon@puppetmaster1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300402)', diff saved to https://phabricator.wikimedia.org/P19977 and previous config saved to /var/cache/conftool/dbconfig/20220202-162435-marostegui.json
  • 16:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19976 and previous config saved to /var/cache/conftool/dbconfig/20220202-162428-marostegui.json
  • 16:24 jbond: disable ldap email checks on mx2001
  • 16:19 Emperor: rolling restart of swift frontends to bring new ones into service T300738
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19975 and previous config saved to /var/cache/conftool/dbconfig/20220202-160923-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19974 and previous config saved to /var/cache/conftool/dbconfig/20220202-155418-marostegui.json
  • 15:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:44 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:43 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:43 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:41 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 15:41 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19973 and previous config saved to /var/cache/conftool/dbconfig/20220202-153913-marostegui.json
  • 15:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
  • 15:37 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:35 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:35 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 15:34 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:32 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:32 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19972 and previous config saved to /var/cache/conftool/dbconfig/20220202-153206-marostegui.json
  • 15:32 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
  • 15:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:32 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 15:30 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300402)', diff saved to https://phabricator.wikimedia.org/P19970 and previous config saved to /var/cache/conftool/dbconfig/20220202-152552-marostegui.json
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19969 and previous config saved to /var/cache/conftool/dbconfig/20220202-151047-marostegui.json
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19968 and previous config saved to /var/cache/conftool/dbconfig/20220202-150832-root.json
  • 15:00 XioNoX: esams: push Capirca generated loopback filters
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19967 and previous config saved to /var/cache/conftool/dbconfig/20220202-145542-marostegui.json
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19966 and previous config saved to /var/cache/conftool/dbconfig/20220202-145329-root.json
  • 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 XioNoX: codfw: push Capirca generated loopback filters
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300402)', diff saved to https://phabricator.wikimedia.org/P19965 and previous config saved to /var/cache/conftool/dbconfig/20220202-144038-marostegui.json
  • 14:39 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19963 and previous config saved to /var/cache/conftool/dbconfig/20220202-143825-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300402)', diff saved to https://phabricator.wikimedia.org/P19962 and previous config saved to /var/cache/conftool/dbconfig/20220202-143221-marostegui.json
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19961 and previous config saved to /var/cache/conftool/dbconfig/20220202-143214-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19960 and previous config saved to /var/cache/conftool/dbconfig/20220202-142321-root.json
  • 14:21 XioNoX: eqsin: push Capirca generated loopback filters
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19959 and previous config saved to /var/cache/conftool/dbconfig/20220202-141709-marostegui.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:15 XioNoX: cr2-eqdfw: push Capirca generated loopback filters
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1020 - as it is the master', diff saved to https://phabricator.wikimedia.org/P19958 and previous config saved to /var/cache/conftool/dbconfig/20220202-141455-marostegui.json
  • 14:13 vgutierrez: pool cp1087 running envoy as TLS terminator - T271421
  • 14:09 XioNoX: cr2-eqord: push Capirca generated loopback filters
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19957 and previous config saved to /var/cache/conftool/dbconfig/20220202-140818-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 schema change', diff saved to https://phabricator.wikimedia.org/P19956 and previous config saved to /var/cache/conftool/dbconfig/20220202-140317-marostegui.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19955 and previous config saved to /var/cache/conftool/dbconfig/20220202-140239-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19954 and previous config saved to /var/cache/conftool/dbconfig/20220202-140204-marostegui.json
  • 13:50 elukey: move docker on ml-serve-ctrl* nodes from device mapper to overlay2
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19953 and previous config saved to /var/cache/conftool/dbconfig/20220202-134735-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19952 and previous config saved to /var/cache/conftool/dbconfig/20220202-134659-marostegui.json
  • 13:40 XioNoX: ULSFO routers: push Capirca generated loopback filters
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19951 and previous config saved to /var/cache/conftool/dbconfig/20220202-133713-marostegui.json
  • 13:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:35 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on production
  • 13:34 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on canary
  • 13:34 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync on production
  • 13:34 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync on canary
  • 13:33 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on canary
  • 13:33 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on production
  • 13:32 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync on production
  • 13:32 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync on canary
  • 13:32 ottomata: roll restarting eventgate-main to pick up stream-configs for rdf-streaming-updater.reconcile
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19949 and previous config saved to /var/cache/conftool/dbconfig/20220202-133231-root.json
  • 13:31 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on canary
  • 13:31 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync on canary
  • 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 13:30 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 13:29 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync on production
  • 13:28 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync on canary
  • 13:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync on production
  • 13:25 XioNoX: rename cr3-ulsfo loopback terms in preparation of move to Capirca
  • 13:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19947 and previous config saved to /var/cache/conftool/dbconfig/20220202-132510-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19946 and previous config saved to /var/cache/conftool/dbconfig/20220202-131728-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19945 and previous config saved to /var/cache/conftool/dbconfig/20220202-131006-marostegui.json
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19944 and previous config saved to /var/cache/conftool/dbconfig/20220202-130224-root.json
  • 12:59 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: ULS: Remove unused ULSEventLogging variable (T275894) (duration: 00m 49s)
  • 12:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19942 and previous config saved to /var/cache/conftool/dbconfig/20220202-125500-marostegui.json
  • 12:54 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Clean-up decommisioned Print schema configs (T196159) (duration: 00m 50s)
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19941 and previous config saved to /var/cache/conftool/dbconfig/20220202-125034-root.json
  • 12:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298558)', diff saved to https://phabricator.wikimedia.org/P19940 and previous config saved to /var/cache/conftool/dbconfig/20220202-124122-marostegui.json
  • 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298558)', diff saved to https://phabricator.wikimedia.org/P19939 and previous config saved to /var/cache/conftool/dbconfig/20220202-124115-marostegui.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19938 and previous config saved to /var/cache/conftool/dbconfig/20220202-123956-marostegui.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19937 and previous config saved to /var/cache/conftool/dbconfig/20220202-123531-root.json
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1019.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1019.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300402)', diff saved to https://phabricator.wikimedia.org/P19936 and previous config saved to /var/cache/conftool/dbconfig/20220202-123127-marostegui.json
  • 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19934 and previous config saved to /var/cache/conftool/dbconfig/20220202-122610-marostegui.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300402)', diff saved to https://phabricator.wikimedia.org/P19933 and previous config saved to /var/cache/conftool/dbconfig/20220202-122112-marostegui.json
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 65%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19932 and previous config saved to /var/cache/conftool/dbconfig/20220202-122027-root.json
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: prod: READ_NEW for CentralAuth hidden level migration (T289068) (duration: 00m 50s)
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19930 and previous config saved to /var/cache/conftool/dbconfig/20220202-121105-marostegui.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19929 and previous config saved to /var/cache/conftool/dbconfig/20220202-120608-marostegui.json
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19928 and previous config saved to /var/cache/conftool/dbconfig/20220202-120524-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298558)', diff saved to https://phabricator.wikimedia.org/P19927 and previous config saved to /var/cache/conftool/dbconfig/20220202-115601-marostegui.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19926 and previous config saved to /var/cache/conftool/dbconfig/20220202-115103-marostegui.json
  • 11:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19925 and previous config saved to /var/cache/conftool/dbconfig/20220202-115020-root.json
  • 11:48 vgutierrez: depool cp1087 to be reimaged as cache::text_envoy - T271421
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298558)', diff saved to https://phabricator.wikimedia.org/P19924 and previous config saved to /var/cache/conftool/dbconfig/20220202-114639-marostegui.json
  • 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:45 _joe_: repooling thanos-fe1001 T300119
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300402)', diff saved to https://phabricator.wikimedia.org/P19923 and previous config saved to /var/cache/conftool/dbconfig/20220202-113558-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19922 and previous config saved to /var/cache/conftool/dbconfig/20220202-113516-root.json
  • 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298558)', diff saved to https://phabricator.wikimedia.org/P19921 and previous config saved to /var/cache/conftool/dbconfig/20220202-113007-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300402)', diff saved to https://phabricator.wikimedia.org/P19920 and previous config saved to /var/cache/conftool/dbconfig/20220202-112849-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:28 _joe_: depooling thanos-fe1001 for testing T300119
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 15%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19919 and previous config saved to /var/cache/conftool/dbconfig/20220202-112013-root.json
  • 11:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300402)', diff saved to https://phabricator.wikimedia.org/P19918 and previous config saved to /var/cache/conftool/dbconfig/20220202-111804-marostegui.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19917 and previous config saved to /var/cache/conftool/dbconfig/20220202-111502-marostegui.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19916 and previous config saved to /var/cache/conftool/dbconfig/20220202-110509-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19915 and previous config saved to /var/cache/conftool/dbconfig/20220202-110259-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19914 and previous config saved to /var/cache/conftool/dbconfig/20220202-105957-marostegui.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19913 and previous config saved to /var/cache/conftool/dbconfig/20220202-105006-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19912 and previous config saved to /var/cache/conftool/dbconfig/20220202-104755-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298558)', diff saved to https://phabricator.wikimedia.org/P19911 and previous config saved to /var/cache/conftool/dbconfig/20220202-104453-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges and recentchanges groups from s4 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19910 and previous config saved to /var/cache/conftool/dbconfig/20220202-103830-marostegui.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19909 and previous config saved to /var/cache/conftool/dbconfig/20220202-103502-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1021 after reimage', diff saved to https://phabricator.wikimedia.org/P19908 and previous config saved to /var/cache/conftool/dbconfig/20220202-103436-marostegui.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19907 and previous config saved to /var/cache/conftool/dbconfig/20220202-103401-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300402)', diff saved to https://phabricator.wikimedia.org/P19906 and previous config saved to /var/cache/conftool/dbconfig/20220202-103250-marostegui.json
  • 10:28 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:27 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298558)', diff saved to https://phabricator.wikimedia.org/P19905 and previous config saved to /var/cache/conftool/dbconfig/20220202-102717-marostegui.json
  • 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:23 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:22 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1008.eqiad.wmnet with OS buster
  • 10:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1021.eqiad.wmnet with OS bullseye
  • 10:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 10:09 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:06 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 04s)
  • 10:06 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 10:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1008.eqiad.wmnet with OS buster
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1019.eqiad.wmnet with OS buster
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1021.eqiad.wmnet with OS bullseye
  • 09:39 moritzm: installing apache/apache-modsecurity2 security updates
  • 09:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1021.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300402)', diff saved to https://phabricator.wikimedia.org/P19904 and previous config saved to /var/cache/conftool/dbconfig/20220202-093231-marostegui.json
  • 09:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300402)', diff saved to https://phabricator.wikimedia.org/P19903 and previous config saved to /var/cache/conftool/dbconfig/20220202-093223-marostegui.json
  • 09:28 marostegui@cumin1001: START - Cookbook sre.hosts.provision for host es1021.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19902 and previous config saved to /var/cache/conftool/dbconfig/20220202-091718-marostegui.json
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS buster
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 T300127', diff saved to https://phabricator.wikimedia.org/P19901 and previous config saved to /var/cache/conftool/dbconfig/20220202-091355-marostegui.json
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 09:10 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 09:08 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 09:07 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es4 T300127 (duration: 00m 50s)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19900 and previous config saved to /var/cache/conftool/dbconfig/20220202-090214-marostegui.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary and set section read-write T300127', diff saved to https://phabricator.wikimedia.org/P19899 and previous config saved to /var/cache/conftool/dbconfig/20220202-090121-marostegui.json
  • 09:00 marostegui: Starting es4 eqiad failover from es1021 to es1020 - T300127
  • 08:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 08:52 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 08:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Switchover es4 T300127
  • 08:48 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Switchover es4 T300127
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300402)', diff saved to https://phabricator.wikimedia.org/P19898 and previous config saved to /var/cache/conftool/dbconfig/20220202-084709-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300402)', diff saved to https://phabricator.wikimedia.org/P19897 and previous config saved to /var/cache/conftool/dbconfig/20220202-084150-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300402)', diff saved to https://phabricator.wikimedia.org/P19896 and previous config saved to /var/cache/conftool/dbconfig/20220202-084143-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19895 and previous config saved to /var/cache/conftool/dbconfig/20220202-082638-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19894 and previous config saved to /var/cache/conftool/dbconfig/20220202-081134-marostegui.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300402)', diff saved to https://phabricator.wikimedia.org/P19893 and previous config saved to /var/cache/conftool/dbconfig/20220202-075629-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300402)', diff saved to https://phabricator.wikimedia.org/P19892 and previous config saved to /var/cache/conftool/dbconfig/20220202-075244-marostegui.json
  • 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300402)', diff saved to https://phabricator.wikimedia.org/P19891 and previous config saved to /var/cache/conftool/dbconfig/20220202-075236-marostegui.json
  • 07:51 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: update wmf-proxy-dashboard (eqiad1) (duration: 04m 09s)
  • 07:47 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: update wmf-proxy-dashboard (eqiad1)
  • 07:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:44 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: update wmf-proxy-dashboard (duration: 02m 19s)
  • 07:42 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: update wmf-proxy-dashboard
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 10 T300127', diff saved to https://phabricator.wikimedia.org/P19890 and previous config saved to /var/cache/conftool/dbconfig/20220202-073918-root.json
  • 07:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switchover es4 T300127
  • 07:38 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switchover es4 T300127
  • 07:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19889 and previous config saved to /var/cache/conftool/dbconfig/20220202-073731-marostegui.json
  • 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:36 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es4 T300127 (duration: 00m 50s)
  • 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:30 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Disable writes on es4 T300127 (duration: 00m 51s)
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19888 and previous config saved to /var/cache/conftool/dbconfig/20220202-072227-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300402)', diff saved to https://phabricator.wikimedia.org/P19887 and previous config saved to /var/cache/conftool/dbconfig/20220202-070722-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300402)', diff saved to https://phabricator.wikimedia.org/P19886 and previous config saved to /var/cache/conftool/dbconfig/20220202-070012-marostegui.json
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:54 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2008.codfw.wmnet with OS buster
  • 02:19 ejegg: updated CiviCRM from 0513f1b7 to 3d379e25
  • 01:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2008.codfw.wmnet with OS buster
  • 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2007.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS buster
  • 01:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2007.codfw.wmnet with OS buster
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS buster
  • 01:12 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2007.codfw.wmnet with OS buster
  • 01:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:03 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: rdf-streaming-updater: add the reconciliation stream (T279541) (duration: 00m 49s)
  • 00:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS buster
  • 00:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:51 urbanecm: UTC late B&C window completed
  • 00:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b560843: Add wgUploadNavigationUrl upload page of ptwikinews (T300466) (duration: 00m 50s)
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2006.codfw.wmnet with OS buster
  • 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:40 urbanecm@deploy1002: Synchronized docroot/noc/db.php: 06444c1: Start writing to some wmg* constants (T45956; 2/2) (duration: 00m 49s)
  • 00:39 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 06444c1: Start writing to some wmg* constants (T45956; 1/2) (duration: 00m 49s)
  • 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b2c13c6: Enable migration mode on all group 0, group 1 and desktop-improvement wikis (T299927) (duration: 01m 58s)
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2006.codfw.wmnet with OS buster
  • 00:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-02-01

  • 22:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS buster
  • 22:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2002-dev.codfw.wmnet with OS bullseye
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS buster
  • 22:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-serve2005.codfw.wmnet with OS buster
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS buster
  • 21:55 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2002-dev.codfw.wmnet with OS bullseye
  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:14 Lucas_WMDE: Deployed patch for T297754
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:42 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.17 (duration: 01m 35s)
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.20 refs T293961
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298558)', diff saved to https://phabricator.wikimedia.org/P19884 and previous config saved to /var/cache/conftool/dbconfig/20220201-202806-marostegui.json
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:21 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.18 (duration: 04m 08s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:20 ejegg: updated payments-wiki from 933e8669 to dbcb5254
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19882 and previous config saved to /var/cache/conftool/dbconfig/20220201-201259-marostegui.json
  • 20:12 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.20 refs T293961 (duration: 51m 42s)
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19881 and previous config saved to /var/cache/conftool/dbconfig/20220201-195755-marostegui.json
  • 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:55 joal@deploy1002: Finished deploy [analytics/refinery@6a7983e] (hadoop-test): Hotfix analytics weekly train TEST [analytics/refinery@6a7983e] (duration: 05m 51s)
  • 19:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:49 joal@deploy1002: Started deploy [analytics/refinery@6a7983e] (hadoop-test): Hotfix analytics weekly train TEST [analytics/refinery@6a7983e]
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298558)', diff saved to https://phabricator.wikimedia.org/P19880 and previous config saved to /var/cache/conftool/dbconfig/20220201-194250-marostegui.json
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298558)', diff saved to https://phabricator.wikimedia.org/P19879 and previous config saved to /var/cache/conftool/dbconfig/20220201-194144-marostegui.json
  • 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298558)', diff saved to https://phabricator.wikimedia.org/P19878 and previous config saved to /var/cache/conftool/dbconfig/20220201-194136-marostegui.json
  • 19:40 joal@deploy1002: Finished deploy [analytics/refinery@6a7983e] (thin): Hotfix analytics weekly train THIN [analytics/refinery@6a7983e] (duration: 00m 07s)
  • 19:40 joal@deploy1002: Started deploy [analytics/refinery@6a7983e] (thin): Hotfix analytics weekly train THIN [analytics/refinery@6a7983e]
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19877 and previous config saved to /var/cache/conftool/dbconfig/20220201-192632-marostegui.json
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:22 joal@deploy1002: Finished deploy [analytics/refinery@6a7983e]: Hotfix analytics weekly train [analytics/refinery@6a7983e] (duration: 19m 09s)
  • 19:20 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.20 refs T293961
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS buster
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19876 and previous config saved to /var/cache/conftool/dbconfig/20220201-191127-marostegui.json
  • 19:02 joal@deploy1002: Started deploy [analytics/refinery@6a7983e]: Hotfix analytics weekly train [analytics/refinery@6a7983e]
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298558)', diff saved to https://phabricator.wikimedia.org/P19875 and previous config saved to /var/cache/conftool/dbconfig/20220201-185622-marostegui.json
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298558)', diff saved to https://phabricator.wikimedia.org/P19874 and previous config saved to /var/cache/conftool/dbconfig/20220201-185516-marostegui.json
  • 18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298558)', diff saved to https://phabricator.wikimedia.org/P19873 and previous config saved to /var/cache/conftool/dbconfig/20220201-185507-marostegui.json
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS buster
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS buster
  • 18:40 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on production
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19872 and previous config saved to /var/cache/conftool/dbconfig/20220201-184027-root.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19871 and previous config saved to /var/cache/conftool/dbconfig/20220201-184002-marostegui.json
  • 18:38 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on canary
  • 18:38 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on canary
  • 18:38 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on production
  • 18:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on production
  • 18:35 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on canary
  • 18:33 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on canary
  • 18:33 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on production
  • 18:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync on production
  • 18:29 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply on canary
  • 18:29 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply on production
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19870 and previous config saved to /var/cache/conftool/dbconfig/20220201-182523-root.json
  • 18:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS buster
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19869 and previous config saved to /var/cache/conftool/dbconfig/20220201-182458-marostegui.json
  • 18:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2001.codfw.wmnet with OS buster
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19868 and previous config saved to /var/cache/conftool/dbconfig/20220201-181019-root.json
  • 18:10 cwhite: end logstash upgrade (eqiad) T299168
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298558)', diff saved to https://phabricator.wikimedia.org/P19867 and previous config saved to /var/cache/conftool/dbconfig/20220201-180953-marostegui.json
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298558)', diff saved to https://phabricator.wikimedia.org/P19866 and previous config saved to /var/cache/conftool/dbconfig/20220201-180847-marostegui.json
  • 18:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298558)', diff saved to https://phabricator.wikimedia.org/P19865 and previous config saved to /var/cache/conftool/dbconfig/20220201-180839-marostegui.json
  • 18:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2017.wmnet
  • 18:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2017.codfw.wmnet with OS buster
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS buster
  • 17:57 urbanecm@deploy1002: Synchronized wmf-config/config/amiwiki.yaml: 7f8bc6d: amiwiki: Deploy Growth features in dark mode (3/3) (duration: 00m 49s)
  • 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 17:56 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 7f8bc6d: amiwiki: Deploy Growth features in dark mode (2/3) (duration: 00m 50s)
  • 17:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7f8bc6d: amiwiki: Deploy Growth features in dark mode (1/3) (duration: 00m 51s)
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19864 and previous config saved to /var/cache/conftool/dbconfig/20220201-175516-root.json
  • 17:54 btullis@deploy1002: Finished deploy [analytics/refinery@c24f002] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c24f002] (duration: 05m 41s)
  • 17:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19863 and previous config saved to /var/cache/conftool/dbconfig/20220201-175334-marostegui.json
  • 17:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:52 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php amiwiki
  • 17:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php amiwiki growthexperiments
  • 17:49 btullis@deploy1002: Started deploy [analytics/refinery@c24f002] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c24f002]
  • 17:48 btullis@deploy1002: Finished deploy [analytics/refinery@c24f002] (thin): Regular analytics weekly train THIN [analytics/refinery@c24f002] (duration: 00m 07s)
  • 17:48 btullis@deploy1002: Started deploy [analytics/refinery@c24f002] (thin): Regular analytics weekly train THIN [analytics/refinery@c24f002]
  • 17:47 cwhite: begin logstash upgrade (eqiad) T299168
  • 17:42 btullis@deploy1002: Finished deploy [analytics/refinery@c24f002]: Regular analytics weekly train [analytics/refinery@c24f002] (duration: 11m 29s)
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19862 and previous config saved to /var/cache/conftool/dbconfig/20220201-174012-root.json
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19861 and previous config saved to /var/cache/conftool/dbconfig/20220201-173830-marostegui.json
  • 17:30 btullis@deploy1002: Started deploy [analytics/refinery@c24f002]: Regular analytics weekly train [analytics/refinery@c24f002]
  • 17:29 btullis: about to deploy analytics/refinery
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19860 and previous config saved to /var/cache/conftool/dbconfig/20220201-172509-root.json
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298558)', diff saved to https://phabricator.wikimedia.org/P19859 and previous config saved to /var/cache/conftool/dbconfig/20220201-172325-marostegui.json
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298558)', diff saved to https://phabricator.wikimedia.org/P19858 and previous config saved to /var/cache/conftool/dbconfig/20220201-172219-marostegui.json
  • 17:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 17:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19857 and previous config saved to /var/cache/conftool/dbconfig/20220201-172205-marostegui.json
  • 17:21 vgutierrez: pool cp2039 running envoy as TLS terminator - T271421
  • 17:17 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19856 and previous config saved to /var/cache/conftool/dbconfig/20220201-171005-root.json
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19855 and previous config saved to /var/cache/conftool/dbconfig/20220201-170701-marostegui.json
  • 16:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19854 and previous config saved to /var/cache/conftool/dbconfig/20220201-165501-root.json
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19852 and previous config saved to /var/cache/conftool/dbconfig/20220201-165156-marostegui.json
  • 16:51 papaul: rebooting pfw3a-codfw and pfw3b for JUNOS upgrade
  • 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2008.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2008.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19851 and previous config saved to /var/cache/conftool/dbconfig/20220201-163958-root.json
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19850 and previous config saved to /var/cache/conftool/dbconfig/20220201-163651-marostegui.json
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19849 and previous config saved to /var/cache/conftool/dbconfig/20220201-163545-marostegui.json
  • 16:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298558)', diff saved to https://phabricator.wikimedia.org/P19848 and previous config saved to /var/cache/conftool/dbconfig/20220201-163537-marostegui.json
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 1%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P19847 and previous config saved to /var/cache/conftool/dbconfig/20220201-162454-root.json
  • 16:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19846 and previous config saved to /var/cache/conftool/dbconfig/20220201-162033-marostegui.json
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300402)', diff saved to https://phabricator.wikimedia.org/P19845 and previous config saved to /var/cache/conftool/dbconfig/20220201-161353-marostegui.json
  • 16:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
  • 16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2007.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:10 vgutierrez: depool cp2039 to be reimaged as cache::text_envoy - T271421
  • 16:09 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
  • 16:09 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19844 and previous config saved to /var/cache/conftool/dbconfig/20220201-160528-marostegui.json
  • 16:05 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 10s)
  • 16:04 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2007.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298558)', diff saved to https://phabricator.wikimedia.org/P19843 and previous config saved to /var/cache/conftool/dbconfig/20220201-155023-marostegui.json
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298558)', diff saved to https://phabricator.wikimedia.org/P19842 and previous config saved to /var/cache/conftool/dbconfig/20220201-154716-marostegui.json
  • 15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19841 and previous config saved to /var/cache/conftool/dbconfig/20220201-154709-marostegui.json
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1010.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 15:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19840 and previous config saved to /var/cache/conftool/dbconfig/20220201-153204-marostegui.json
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19839 and previous config saved to /var/cache/conftool/dbconfig/20220201-152323-marostegui.json
  • 15:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 15:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1010.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19838 and previous config saved to /var/cache/conftool/dbconfig/20220201-151700-marostegui.json
  • 15:13 kart_: Deployed Flores MT for cxserver + Updated cxserver to 2022-01-13-174407-production (T298584, T292412, T292415, T298679, T298752) + Updated cxserver to 2022-02-01-141918-production (T298592)
  • 15:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 15:10 jelto: update scap to 4.2.2 on all hosts - T300392
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P19837 and previous config saved to /var/cache/conftool/dbconfig/20220201-150818-marostegui.json
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1016.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1016.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum6002.drmrs.wmnet
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19836 and previous config saved to /var/cache/conftool/dbconfig/20220201-150155-marostegui.json
  • 15:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: sync on production
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298558)', diff saved to https://phabricator.wikimedia.org/P19835 and previous config saved to /var/cache/conftool/dbconfig/20220201-150049-marostegui.json
  • 15:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298558)', diff saved to https://phabricator.wikimedia.org/P19834 and previous config saved to /var/cache/conftool/dbconfig/20220201-150041-marostegui.json
  • 14:59 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply on staging
  • 14:59 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply on production
  • 14:58 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: sync on production
  • 14:56 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply on staging
  • 14:56 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply on production
  • 14:53 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: sync on staging
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P19833 and previous config saved to /var/cache/conftool/dbconfig/20220201-145314-marostegui.json
  • 14:52 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply on production
  • 14:52 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply on staging
  • 14:52 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum6002.drmrs.wmnet
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19832 and previous config saved to /var/cache/conftool/dbconfig/20220201-144536-marostegui.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19831 and previous config saved to /var/cache/conftool/dbconfig/20220201-143809-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19830 and previous config saved to /var/cache/conftool/dbconfig/20220201-143504-marostegui.json
  • 14:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300402)', diff saved to https://phabricator.wikimedia.org/P19829 and previous config saved to /var/cache/conftool/dbconfig/20220201-143456-marostegui.json
  • 14:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19828 and previous config saved to /var/cache/conftool/dbconfig/20220201-143031-marostegui.json
  • 14:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P19827 and previous config saved to /var/cache/conftool/dbconfig/20220201-141952-marostegui.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298558)', diff saved to https://phabricator.wikimedia.org/P19826 and previous config saved to /var/cache/conftool/dbconfig/20220201-141527-marostegui.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298558)', diff saved to https://phabricator.wikimedia.org/P19825 and previous config saved to /var/cache/conftool/dbconfig/20220201-141420-marostegui.json
  • 14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298558)', diff saved to https://phabricator.wikimedia.org/P19824 and previous config saved to /var/cache/conftool/dbconfig/20220201-141413-marostegui.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P19823 and previous config saved to /var/cache/conftool/dbconfig/20220201-140447-marostegui.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19822 and previous config saved to /var/cache/conftool/dbconfig/20220201-135908-marostegui.json
  • 13:54 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 13:54 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:52 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on external
  • 13:50 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 13:50 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on internal
  • 13:50 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on external
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300402)', diff saved to https://phabricator.wikimedia.org/P19821 and previous config saved to /var/cache/conftool/dbconfig/20220201-134942-marostegui.json
  • 13:49 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 13:48 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:48 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on external
  • 13:47 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300402)', diff saved to https://phabricator.wikimedia.org/P19820 and previous config saved to /var/cache/conftool/dbconfig/20220201-134740-marostegui.json
  • 13:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:47 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 13:47 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on external
  • 13:47 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on internal
  • 13:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19819 and previous config saved to /var/cache/conftool/dbconfig/20220201-134524-marostegui.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19818 and previous config saved to /var/cache/conftool/dbconfig/20220201-134403-marostegui.json
  • 13:43 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 13:43 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 13:43 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 13:43 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 13:41 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:41 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 13:41 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 13:41 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 13:38 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 13:32 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P19817 and previous config saved to /var/cache/conftool/dbconfig/20220201-133020-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298558)', diff saved to https://phabricator.wikimedia.org/P19816 and previous config saved to /var/cache/conftool/dbconfig/20220201-132858-marostegui.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298558)', diff saved to https://phabricator.wikimedia.org/P19815 and previous config saved to /var/cache/conftool/dbconfig/20220201-132652-marostegui.json
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298558)', diff saved to https://phabricator.wikimedia.org/P19814 and previous config saved to /var/cache/conftool/dbconfig/20220201-132624-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P19813 and previous config saved to /var/cache/conftool/dbconfig/20220201-131515-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19812 and previous config saved to /var/cache/conftool/dbconfig/20220201-131119-marostegui.json
  • 13:09 hashar: Restarting CI Jenkins
  • 13:09 hashar: Restarting Gerrit
  • 13:01 hashar: Restarted Jenkins on releases1002.eqiad.wmnet
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19810 and previous config saved to /var/cache/conftool/dbconfig/20220201-130010-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19809 and previous config saved to /var/cache/conftool/dbconfig/20220201-125805-marostegui.json
  • 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19808 and previous config saved to /var/cache/conftool/dbconfig/20220201-125615-marostegui.json
  • 12:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui: Set innodb_adaptive_hash_index=OFF on: db1129 es1029 es1030 es1028 es1020 es1023 T268869
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19807 and previous config saved to /var/cache/conftool/dbconfig/20220201-125605-marostegui.json
  • 12:52 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum6001.drmrs.wmnet
  • 12:42 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum6001.drmrs.wmnet
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298558)', diff saved to https://phabricator.wikimedia.org/P19806 and previous config saved to /var/cache/conftool/dbconfig/20220201-124110-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P19805 and previous config saved to /var/cache/conftool/dbconfig/20220201-124100-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298558)', diff saved to https://phabricator.wikimedia.org/P19804 and previous config saved to /var/cache/conftool/dbconfig/20220201-124004-marostegui.json
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:39 moritzm: installing openjdk-11 security updates
  • 12:31 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: sync on production
  • 12:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply on staging
  • 12:30 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply on production
  • 12:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync on production
  • 12:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply on staging
  • 12:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply on production
  • 12:29 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
  • 12:28 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 12:28 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P19803 and previous config saved to /var/cache/conftool/dbconfig/20220201-122556-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19802 and previous config saved to /var/cache/conftool/dbconfig/20220201-121051-marostegui.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300402)', diff saved to https://phabricator.wikimedia.org/P19801 and previous config saved to /var/cache/conftool/dbconfig/20220201-120847-marostegui.json
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300402)', diff saved to https://phabricator.wikimedia.org/P19800 and previous config saved to /var/cache/conftool/dbconfig/20220201-120839-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298558)', diff saved to https://phabricator.wikimedia.org/P19799 and previous config saved to /var/cache/conftool/dbconfig/20220201-115923-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P19798 and previous config saved to /var/cache/conftool/dbconfig/20220201-115334-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19797 and previous config saved to /var/cache/conftool/dbconfig/20220201-114418-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P19796 and previous config saved to /var/cache/conftool/dbconfig/20220201-113830-marostegui.json
  • 11:31 elukey: roll restart ORES to pick up logging change (use XFF header when possible) - T299137
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19795 and previous config saved to /var/cache/conftool/dbconfig/20220201-112913-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300402)', diff saved to https://phabricator.wikimedia.org/P19794 and previous config saved to /var/cache/conftool/dbconfig/20220201-112325-marostegui.json
  • 11:19 hnowlan: roll-restarting maps services in eqiad for updates
  • 11:17 hnowlan: roll-restarting maps services in codfw for updates
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300402)', diff saved to https://phabricator.wikimedia.org/P19793 and previous config saved to /var/cache/conftool/dbconfig/20220201-111420-marostegui.json
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300402)', diff saved to https://phabricator.wikimedia.org/P19792 and previous config saved to /var/cache/conftool/dbconfig/20220201-111413-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298558)', diff saved to https://phabricator.wikimedia.org/P19791 and previous config saved to /var/cache/conftool/dbconfig/20220201-111409-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298558)', diff saved to https://phabricator.wikimedia.org/P19790 and previous config saved to /var/cache/conftool/dbconfig/20220201-110855-marostegui.json
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298558)', diff saved to https://phabricator.wikimedia.org/P19789 and previous config saved to /var/cache/conftool/dbconfig/20220201-110848-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P19788 and previous config saved to /var/cache/conftool/dbconfig/20220201-105906-marostegui.json
  • 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2105.codfw.wmnet with OS bullseye
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19787 and previous config saved to /var/cache/conftool/dbconfig/20220201-105343-marostegui.json
  • 10:53 Lucas_WMDE: Deployed patch for T297754
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P19786 and previous config saved to /var/cache/conftool/dbconfig/20220201-104402-marostegui.json
  • 10:41 vgutierrez: restart ATS-TLS on cp3058
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s4 codfw T263127', diff saved to https://phabricator.wikimedia.org/P19785 and previous config saved to /var/cache/conftool/dbconfig/20220201-104118-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19784 and previous config saved to /var/cache/conftool/dbconfig/20220201-103838-marostegui.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300402)', diff saved to https://phabricator.wikimedia.org/P19783 and previous config saved to /var/cache/conftool/dbconfig/20220201-102857-marostegui.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions from s4 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19782 and previous config saved to /var/cache/conftool/dbconfig/20220201-102512-marostegui.json
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1010.eqiad.wmnet with OS buster
  • 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2105.codfw.wmnet with OS bullseye
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bumeh-ctr out of all services on: 5 hosts
  • 10:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bumeh-ctr out of all services on: 5 hosts
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300402)', diff saved to https://phabricator.wikimedia.org/P19781 and previous config saved to /var/cache/conftool/dbconfig/20220201-102356-marostegui.json
  • 10:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298558)', diff saved to https://phabricator.wikimedia.org/P19780 and previous config saved to /var/cache/conftool/dbconfig/20220201-102333-marostegui.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300402)', diff saved to https://phabricator.wikimedia.org/P19779 and previous config saved to /var/cache/conftool/dbconfig/20220201-102300-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298558)', diff saved to https://phabricator.wikimedia.org/P19778 and previous config saved to /var/cache/conftool/dbconfig/20220201-102221-marostegui.json
  • 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298558)', diff saved to https://phabricator.wikimedia.org/P19777 and previous config saved to /var/cache/conftool/dbconfig/20220201-102207-marostegui.json
  • 10:14 vgutierrez: pool cp3062 running envoy as TLS terminator - T271421
  • 10:10 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply on staging
  • 10:10 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply on production
  • 10:08 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: sync on production
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P19775 and previous config saved to /var/cache/conftool/dbconfig/20220201-100756-marostegui.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19774 and previous config saved to /var/cache/conftool/dbconfig/20220201-100703-marostegui.json
  • 10:05 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply on staging
  • 10:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply on production
  • 10:01 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow6001.drmrs.wmnet
  • 10:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
  • 10:01 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: sync on staging
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: repooling', diff saved to https://phabricator.wikimedia.org/P19773 and previous config saved to /var/cache/conftool/dbconfig/20220201-100052-root.json
  • 10:00 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply on production
  • 10:00 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply on staging
  • 09:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1010.eqiad.wmnet with OS buster
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P19772 and previous config saved to /var/cache/conftool/dbconfig/20220201-095251-marostegui.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19771 and previous config saved to /var/cache/conftool/dbconfig/20220201-095158-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: repooling', diff saved to https://phabricator.wikimedia.org/P19770 and previous config saved to /var/cache/conftool/dbconfig/20220201-094548-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300402)', diff saved to https://phabricator.wikimedia.org/P19769 and previous config saved to /var/cache/conftool/dbconfig/20220201-093747-marostegui.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300402)', diff saved to https://phabricator.wikimedia.org/P19768 and previous config saved to /var/cache/conftool/dbconfig/20220201-093717-marostegui.json
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300402)', diff saved to https://phabricator.wikimedia.org/P19767 and previous config saved to /var/cache/conftool/dbconfig/20220201-093709-marostegui.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298558)', diff saved to https://phabricator.wikimedia.org/P19766 and previous config saved to /var/cache/conftool/dbconfig/20220201-093653-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: repooling', diff saved to https://phabricator.wikimedia.org/P19765 and previous config saved to /var/cache/conftool/dbconfig/20220201-093044-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P19764 and previous config saved to /var/cache/conftool/dbconfig/20220201-092204-marostegui.json
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2127.codfw.wmnet with OS bullseye
  • 09:20 moritzm: installing apache/apache-modsecurity2 security updates
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS bullseye
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298558)', diff saved to https://phabricator.wikimedia.org/P19763 and previous config saved to /var/cache/conftool/dbconfig/20220201-091541-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: repooling', diff saved to https://phabricator.wikimedia.org/P19762 and previous config saved to /var/cache/conftool/dbconfig/20220201-091541-root.json
  • 09:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19761 and previous config saved to /var/cache/conftool/dbconfig/20220201-091534-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P19760 and previous config saved to /var/cache/conftool/dbconfig/20220201-090700-marostegui.json
  • 09:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
  • 09:02 mmandere: apt1001 Delete unused stretch and buster dist libvarnisapi1 package T300264
  • 09:01 vgutierrez: depool cp3062 to be reimaged as cache::text_envoy - T271421
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: repooling', diff saved to https://phabricator.wikimedia.org/P19759 and previous config saved to /var/cache/conftool/dbconfig/20220201-090031-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19758 and previous config saved to /var/cache/conftool/dbconfig/20220201-090029-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1100.eqiad.wmnet with OS bullseye
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300402)', diff saved to https://phabricator.wikimedia.org/P19757 and previous config saved to /var/cache/conftool/dbconfig/20220201-085155-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300402)', diff saved to https://phabricator.wikimedia.org/P19756 and previous config saved to /var/cache/conftool/dbconfig/20220201-085040-marostegui.json
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300402)', diff saved to https://phabricator.wikimedia.org/P19755 and previous config saved to /var/cache/conftool/dbconfig/20220201-084956-marostegui.json
  • 08:46 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2127.codfw.wmnet with OS bullseye
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19754 and previous config saved to /var/cache/conftool/dbconfig/20220201-084524-marostegui.json
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS bullseye
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2074.codfw.wmnet with OS bullseye
  • 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2109.codfw.wmnet with OS bullseye
  • 08:38 moritzm: draining ganeti1016 for eventual reimage
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P19753 and previous config saved to /var/cache/conftool/dbconfig/20220201-083452-marostegui.json
  • 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1100.eqiad.wmnet with OS bullseye
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19752 and previous config saved to /var/cache/conftool/dbconfig/20220201-083020-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19751 and previous config saved to /var/cache/conftool/dbconfig/20220201-082906-marostegui.json
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298558)', diff saved to https://phabricator.wikimedia.org/P19750 and previous config saved to /var/cache/conftool/dbconfig/20220201-082825-marostegui.json
  • 08:28 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1100.eqiad.wmnet with OS bullseye
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 08:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P19749 and previous config saved to /var/cache/conftool/dbconfig/20220201-081947-marostegui.json
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1100.eqiad.wmnet with OS bullseye
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19748 and previous config saved to /var/cache/conftool/dbconfig/20220201-081321-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for reimage T300473', diff saved to https://phabricator.wikimedia.org/P19747 and previous config saved to /var/cache/conftool/dbconfig/20220201-081050-marostegui.json
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2109.codfw.wmnet with OS bullseye
  • 08:06 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2074.codfw.wmnet with OS bullseye
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: repooling', diff saved to https://phabricator.wikimedia.org/P19746 and previous config saved to /var/cache/conftool/dbconfig/20220201-080449-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300402)', diff saved to https://phabricator.wikimedia.org/P19745 and previous config saved to /var/cache/conftool/dbconfig/20220201-080442-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300402)', diff saved to https://phabricator.wikimedia.org/P19744 and previous config saved to /var/cache/conftool/dbconfig/20220201-080328-marostegui.json
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300402)', diff saved to https://phabricator.wikimedia.org/P19743 and previous config saved to /var/cache/conftool/dbconfig/20220201-080315-marostegui.json
  • 08:01 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19742 and previous config saved to /var/cache/conftool/dbconfig/20220201-075816-marostegui.json
  • 07:56 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1005.eqiad.wmnet
  • 07:56 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus1005.eqiad.wmnet
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: repooling', diff saved to https://phabricator.wikimedia.org/P19741 and previous config saved to /var/cache/conftool/dbconfig/20220201-074945-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P19740 and previous config saved to /var/cache/conftool/dbconfig/20220201-074810-marostegui.json
  • 07:47 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1005.eqiad.wmnet
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298558)', diff saved to https://phabricator.wikimedia.org/P19739 and previous config saved to /var/cache/conftool/dbconfig/20220201-074311-marostegui.json
  • 07:39 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus1005.eqiad.wmnet
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: repooling', diff saved to https://phabricator.wikimedia.org/P19738 and previous config saved to /var/cache/conftool/dbconfig/20220201-073441-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P19737 and previous config saved to /var/cache/conftool/dbconfig/20220201-073306-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298558)', diff saved to https://phabricator.wikimedia.org/P19736 and previous config saved to /var/cache/conftool/dbconfig/20220201-073256-marostegui.json
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19735 and previous config saved to /var/cache/conftool/dbconfig/20220201-073248-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: repooling', diff saved to https://phabricator.wikimedia.org/P19734 and previous config saved to /var/cache/conftool/dbconfig/20220201-071938-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300402)', diff saved to https://phabricator.wikimedia.org/P19733 and previous config saved to /var/cache/conftool/dbconfig/20220201-071801-marostegui.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19732 and previous config saved to /var/cache/conftool/dbconfig/20220201-071743-marostegui.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300402)', diff saved to https://phabricator.wikimedia.org/P19731 and previous config saved to /var/cache/conftool/dbconfig/20220201-071648-marostegui.json
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300402)', diff saved to https://phabricator.wikimedia.org/P19730 and previous config saved to /var/cache/conftool/dbconfig/20220201-071640-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: repooling', diff saved to https://phabricator.wikimedia.org/P19729 and previous config saved to /var/cache/conftool/dbconfig/20220201-070434-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19728 and previous config saved to /var/cache/conftool/dbconfig/20220201-070239-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P19727 and previous config saved to /var/cache/conftool/dbconfig/20220201-070135-marostegui.json
  • 06:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1110.eqiad.wmnet with OS bullseye
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: repooling', diff saved to https://phabricator.wikimedia.org/P19726 and previous config saved to /var/cache/conftool/dbconfig/20220201-064930-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19725 and previous config saved to /var/cache/conftool/dbconfig/20220201-064734-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P19724 and previous config saved to /var/cache/conftool/dbconfig/20220201-064631-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19723 and previous config saved to /var/cache/conftool/dbconfig/20220201-064620-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19722 and previous config saved to /var/cache/conftool/dbconfig/20220201-064549-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: repooling', diff saved to https://phabricator.wikimedia.org/P19721 and previous config saved to /var/cache/conftool/dbconfig/20220201-064149-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300402)', diff saved to https://phabricator.wikimedia.org/P19720 and previous config saved to /var/cache/conftool/dbconfig/20220201-063126-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19719 and previous config saved to /var/cache/conftool/dbconfig/20220201-063044-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300402)', diff saved to https://phabricator.wikimedia.org/P19718 and previous config saved to /var/cache/conftool/dbconfig/20220201-063013-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: repooling', diff saved to https://phabricator.wikimedia.org/P19717 and previous config saved to /var/cache/conftool/dbconfig/20220201-062646-root.json
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1110.eqiad.wmnet with OS bullseye
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage T300473', diff saved to https://phabricator.wikimedia.org/P19716 and previous config saved to /var/cache/conftool/dbconfig/20220201-062111-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19715 and previous config saved to /var/cache/conftool/dbconfig/20220201-061540-marostegui.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: repooling', diff saved to https://phabricator.wikimedia.org/P19714 and previous config saved to /var/cache/conftool/dbconfig/20220201-061142-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19713 and previous config saved to /var/cache/conftool/dbconfig/20220201-060035-marostegui.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298558)', diff saved to https://phabricator.wikimedia.org/P19712 and previous config saved to /var/cache/conftool/dbconfig/20220201-055921-marostegui.json
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: repooling', diff saved to https://phabricator.wikimedia.org/P19711 and previous config saved to /var/cache/conftool/dbconfig/20220201-055638-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19710 and previous config saved to /var/cache/conftool/dbconfig/20220201-055327-marostegui.json
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 03:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 03:36 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 02:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:48 ryankemper: T282117 Merged https://gerrit.wikimedia.org/r/c/operations/dns/+/717606 and successfully ran `sudo -i authdns-update` on `authdns1001`. `commons-query.wikimedia.org` is online now. (sidenote: go-live date of service is 2022-02-01)
  • 01:42 ryankemper: T299222 `ryankemper@cumin1001:~$ sudo cumin 'wcqs*' 'sudo rm -fv /etc/default/wcqs-updater'`
  • 01:42 ryankemper: T299222 `ryankemper@cumin1001:~$ sudo cumin 'wdqs*' 'sudo rm -fv /etc/default/wdqs-updater'`
  • 01:25 ryankemper: T299222 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/757124; running puppet on `w*qs*` before purging old filepaths
  • 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:24 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Local upload on ptwikinews (T300466) (duration: 00m 50s)
  • 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:18 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 00:11 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Lower The Wikipedia Library extension edit count (T288070) (duration: 00m 50s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

Archives

See Server Admin Log/Archives.