You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set archive namespaces on foundationwiki to 'noindex,follow' (T288763) (duration: 00m 59s)) |
imported>Stashbot (ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48979 and previous config saved to /var/cache/conftool/dbconfig/20230607-011602-ladsgroup.json) |
||
(602 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== | == 2023-06-07 == | ||
* | * 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48979 and previous config saved to /var/cache/conftool/dbconfig/20230607-011602-ladsgroup.json | ||
* | * 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48978 and previous config saved to /var/cache/conftool/dbconfig/20230607-011553-ladsgroup.json | ||
* 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48977 and previous config saved to /var/cache/conftool/dbconfig/20230607-010055-ladsgroup.json | |||
* | * 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48976 and previous config saved to /var/cache/conftool/dbconfig/20230607-010047-ladsgroup.json | ||
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48975 and previous config saved to /var/cache/conftool/dbconfig/20230607-005722-ladsgroup.json | |||
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48974 and previous config saved to /var/cache/conftool/dbconfig/20230607-005713-ladsgroup.json | |||
* | * 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance | ||
* 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance | |||
* | * 00:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance | ||
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48973 and previous config saved to /var/cache/conftool/dbconfig/20230607-005654-ladsgroup.json | |||
* 00:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance | |||
* 00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance | |||
* 00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance | |||
* | * 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | ||
* | * 00:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | ||
* | * 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48972 and previous config saved to /var/cache/conftool/dbconfig/20230607-005155-ladsgroup.json | ||
* 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48971 and previous config saved to /var/cache/conftool/dbconfig/20230607-004148-ladsgroup.json | |||
* | * 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48970 and previous config saved to /var/cache/conftool/dbconfig/20230607-003649-ladsgroup.json | ||
* | * 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48969 and previous config saved to /var/cache/conftool/dbconfig/20230607-002642-ladsgroup.json | ||
* | * 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48968 and previous config saved to /var/cache/conftool/dbconfig/20230607-002143-ladsgroup.json | ||
* | * 00:14 urbanecm:: Deployed security patch for [[phab:T338276|T338276]] | ||
* 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48967 and previous config saved to /var/cache/conftool/dbconfig/20230607-001136-ladsgroup.json | |||
* 00:08 urbanecm:: Deployed security patch for [[phab:T338276|T338276]] | |||
* | * 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48966 and previous config saved to /var/cache/conftool/dbconfig/20230607-000814-ladsgroup.json | ||
* 00:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance | |||
* | * 00:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance | ||
* 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48965 and previous config saved to /var/cache/conftool/dbconfig/20230607-000754-ladsgroup.json | |||
* | * 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48964 and previous config saved to /var/cache/conftool/dbconfig/20230607-000637-ladsgroup.json | ||
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48963 and previous config saved to /var/cache/conftool/dbconfig/20230607-000337-ladsgroup.json | |||
* | * 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance | ||
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance | |||
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48962 and previous config saved to /var/cache/conftool/dbconfig/20230607-000316-ladsgroup.json | |||
* | * 00:01 urbanecm: Deploying security patch for [[phab:T338276|T338276]] | ||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
== 2023-06-06 == | |||
* 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48961 and previous config saved to /var/cache/conftool/dbconfig/20230606-235248-ladsgroup.json | |||
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48960 and previous config saved to /var/cache/conftool/dbconfig/20230606-234810-ladsgroup.json | |||
* 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet | |||
* 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48959 and previous config saved to /var/cache/conftool/dbconfig/20230606-233742-ladsgroup.json | |||
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48958 and previous config saved to /var/cache/conftool/dbconfig/20230606-233304-ladsgroup.json | |||
* 23:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye | |||
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48955 and previous config saved to /var/cache/conftool/dbconfig/20230606-232235-ladsgroup.json | |||
* 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox | |||
== 2023-06-05 == | |||
* 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48812 and previous config saved to /var/cache/conftool/dbconfig/20230605-235346-ladsgroup.json | |||
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance | |||
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance | |||
* 23 | |||
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox | * 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox | ||
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48741 and previous config saved to /var/cache/conftool/dbconfig/20230605-150347-ladsgroup.json | |||
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48740 and previous config saved to /var/cache/conftool/dbconfig/20230605-150138-ladsgroup.json | |||
* 15:01 ladsgroup@cumin1001: END (PASS) | |||
== | == 2023-06-03 == | ||
* | * 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade | ||
* 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade | |||
* 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet | |||
* 13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet | |||
* | |||
* | |||
* | |||
== | == 2023-06-02 == | ||
* 13: | * 20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps' | ||
* | * 18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc | ||
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002" | |||
* 17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002" | |||
* 17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups | |||
* 16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed | |||
* 15:35 sukhe: restart ntp.service on dns1002 | |||
* 13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479 | |||
* 13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release | |||
* 09:35 moritzm: installing texlive-security updates on buster | |||
* 09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) [[phab:T337836|T337836]]. Reload systemd for unit changes to take effect | |||
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet | |||
* 08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet | |||
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts: | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts: | |||
* 08:42 moritzm: installing traceroute bugfix updates from Bullseye point release | |||
* 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org | |||
* 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org | |||
* 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org | |||
* 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org | |||
* 07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org | |||
* 07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org | |||
* 01:53 ejegg: fundraising python tools upgraded from {{Gerrit|759d4c89}} to {{Gerrit|2ca83336}} | |||
* 01:22 cstone: civicrm upgraded from {{Gerrit|3819d6d1}} to {{Gerrit|bcc8fccc}} | |||
== | == 2023-06-01 == | ||
* 19:12 | * 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] (duration: 08m 30s) | ||
* 16: | * 20:59 samtar@deploy1002: esanders and samtar: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* 16: | * 20:57 samtar@deploy1002: Started scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] | ||
* 16: | * 20:54 samtar@deploy1002: Finished scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] (duration: 10m 29s) | ||
* 07: | * 20:45 samtar@deploy1002: samtar and ksarabia: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* 20:44 samtar@deploy1002: Started scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] | |||
* 20:21 samtar@deploy1002: Finished scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] (duration: 07m 56s) | |||
* 20:15 samtar@deploy1002: dani and samtar: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:13 samtar@deploy1002: Started scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] | |||
* 20:12 samtar@deploy1002: Finished scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] (duration: 08m 20s) | |||
* 20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] | |||
* 19:51 ejegg: fundraising python tools upgraded from {{Gerrit|72570bdd}} to {{Gerrit|759d4c89}} | |||
* 19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s) | |||
* 19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided) | |||
* 19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s) | |||
* 19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work | |||
* 18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11 refs [[phab:T337525|T337525]] | |||
* 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s) | |||
* 17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp | |||
* 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) | |||
* 17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply | |||
* 17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply | |||
* 17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp | |||
* 17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply | |||
* 17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply | |||
* 17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | |||
* 17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply | |||
* 17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting. Recent EventBus changes are not deployed yet? - [[phab:T336817|T336817]] (duration: 07m 24s) | |||
* 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002" | |||
* 16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002" | |||
* 16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - [[phab:T336817|T336817]] (duration: 08m 18s) | |||
* 16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* 16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] (round 2!) | |||
* 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" | |||
* 16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" | |||
* 16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage | |||
* 16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage | |||
* 16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage | |||
* 16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - [[phab:T337041|T337041]] | |||
* 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage | |||
* 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook | |||
* 15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | |||
* 15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes | |||
* 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye | |||
* 14:59 moritzm: installing python-sqlparse security updates | |||
* 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox | |||
* 14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 14:53 moritzm: installing jackson-databind security updates | |||
* 14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox | |||
* 14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook | |||
* 14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage | |||
* 14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage | |||
* 14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp | |||
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary | |||
* 14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary | |||
* 14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp | |||
* 14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 14:29 moritzm: installing imagemagick security updates on buster | |||
* 14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye | |||
* 14:14 fabfur: Disabled puppet on A:cp-drmrs for [[phab:T323557|T323557]] | |||
* 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s) | |||
* 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) | |||
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json | |||
* 14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - [[phab:T337505|T337505]] | |||
* 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json | |||
* 13:52 moritzm: installing sysstat security updates | |||
* 13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json | |||
* 13:29 moritzm: installing openssl security updates on bullseye | |||
* 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json | |||
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json | |||
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance | |||
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json | |||
* 13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - [[phab:T337505|T337505]] | |||
* 13:13 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] (duration: 11m 08s) | |||
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json | |||
* 13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 13:02 urbanecm@deploy1002: Started scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] | |||
* 12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json | |||
* 12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json | |||
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json | |||
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance | |||
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json | |||
* 12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json | |||
* 12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # [[phab:T336365|T336365]] | |||
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json | |||
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json | |||
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json | |||
* 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance | |||
* 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance | |||
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json | |||
* 11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json | |||
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json | |||
* 11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707 | |||
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json | |||
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json | |||
* 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance | |||
* 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance | |||
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json | |||
* 10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json | |||
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json | |||
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | |||
* 10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | |||
* 10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json | |||
* 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json | |||
* 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance | |||
* 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance | |||
* 09:56 moritzm: installing systemd security updates on bullseye | |||
* 09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php ([[phab:T336224|T336224]]) | |||
* 09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G | |||
* 09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:32 volans: installed spicerack v7.2.0 on cumin2002 | |||
* 09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet | |||
* 09:18 godog: remove lv prometheus-global - [[phab:T288196|T288196]] | |||
* 09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet | |||
* 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet | |||
* 09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet | |||
* 09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet | |||
* 09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet | |||
* 09:12 volans: installed spicerack v7.2.0 on cumin1001 | |||
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet | |||
* 09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet | |||
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet | |||
* 09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet | |||
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet | |||
* 08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet | |||
* 08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001" | |||
* 08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001" | |||
* 08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:48 apergos: UTC morning backport and config training window done | |||
* 08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:28 daniel@deploy1002: Finished scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] (duration: 10m 12s) | |||
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:19 daniel@deploy1002: daniel and isaranto: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 08:18 daniel@deploy1002: Started scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] | |||
* 07:55 daniel@deploy1002: Finished scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] (duration: 09m 09s) | |||
* 07:48 daniel@deploy1002: daniel: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 07:46 daniel@deploy1002: Started scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] | |||
* 07:42 mlitn@deploy1002: Finished scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] (duration: 33m 02s) | |||
* 07:35 moritzm: installing libssh security updates | |||
* 07:29 mlitn@deploy1002: mlitn: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress | |||
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress | |||
* 07:09 mlitn@deploy1002: Started scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] | |||
* 06:16 kart_: Updated MinT to 2023-06-01-041041-production ([[phab:T336525|T336525]]) | |||
* 06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied | |||
* 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 05:39 kart_: Updated cxserver to 2023-06-01-041016-production ([[phab:T337669|T337669]]) | |||
* 05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 00:11 eileen: civicrm upgraded from {{Gerrit|885208ca}} to {{Gerrit|3819d6d1}} | |||
==Archives== | ==Archives == | ||
See [[Server Admin Log/Archives]]. | See [[Server Admin Log/Archives]]. | ||
<noinclude> | <noinclude> |
Latest revision as of 01:16, 7 June 2023
2023-06-07
- 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48979 and previous config saved to /var/cache/conftool/dbconfig/20230607-011602-ladsgroup.json
- 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48978 and previous config saved to /var/cache/conftool/dbconfig/20230607-011553-ladsgroup.json
- 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48977 and previous config saved to /var/cache/conftool/dbconfig/20230607-010055-ladsgroup.json
- 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48976 and previous config saved to /var/cache/conftool/dbconfig/20230607-010047-ladsgroup.json
- 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48975 and previous config saved to /var/cache/conftool/dbconfig/20230607-005722-ladsgroup.json
- 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48974 and previous config saved to /var/cache/conftool/dbconfig/20230607-005713-ladsgroup.json
- 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 00:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48973 and previous config saved to /var/cache/conftool/dbconfig/20230607-005654-ladsgroup.json
- 00:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 00:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48972 and previous config saved to /var/cache/conftool/dbconfig/20230607-005155-ladsgroup.json
- 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48971 and previous config saved to /var/cache/conftool/dbconfig/20230607-004148-ladsgroup.json
- 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48970 and previous config saved to /var/cache/conftool/dbconfig/20230607-003649-ladsgroup.json
- 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48969 and previous config saved to /var/cache/conftool/dbconfig/20230607-002642-ladsgroup.json
- 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48968 and previous config saved to /var/cache/conftool/dbconfig/20230607-002143-ladsgroup.json
- 00:14 urbanecm:: Deployed security patch for T338276
- 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48967 and previous config saved to /var/cache/conftool/dbconfig/20230607-001136-ladsgroup.json
- 00:08 urbanecm:: Deployed security patch for T338276
- 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48966 and previous config saved to /var/cache/conftool/dbconfig/20230607-000814-ladsgroup.json
- 00:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 00:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48965 and previous config saved to /var/cache/conftool/dbconfig/20230607-000754-ladsgroup.json
- 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48964 and previous config saved to /var/cache/conftool/dbconfig/20230607-000637-ladsgroup.json
- 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48963 and previous config saved to /var/cache/conftool/dbconfig/20230607-000337-ladsgroup.json
- 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48962 and previous config saved to /var/cache/conftool/dbconfig/20230607-000316-ladsgroup.json
- 00:01 urbanecm: Deploying security patch for T338276
2023-06-06
- 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48961 and previous config saved to /var/cache/conftool/dbconfig/20230606-235248-ladsgroup.json
- 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48960 and previous config saved to /var/cache/conftool/dbconfig/20230606-234810-ladsgroup.json
- 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
- 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48959 and previous config saved to /var/cache/conftool/dbconfig/20230606-233742-ladsgroup.json
- 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48958 and previous config saved to /var/cache/conftool/dbconfig/20230606-233304-ladsgroup.json
- 23:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
- 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48955 and previous config saved to /var/cache/conftool/dbconfig/20230606-232235-ladsgroup.json
- 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
- 23:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
- 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48954 and previous config saved to /var/cache/conftool/dbconfig/20230606-231913-ladsgroup.json
- 23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 23:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48953 and previous config saved to /var/cache/conftool/dbconfig/20230606-231853-ladsgroup.json
- 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48952 and previous config saved to /var/cache/conftool/dbconfig/20230606-231758-ladsgroup.json
- 23:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 23:16 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
- 23:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
- 23:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
- 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48951 and previous config saved to /var/cache/conftool/dbconfig/20230606-231408-ladsgroup.json
- 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48950 and previous config saved to /var/cache/conftool/dbconfig/20230606-231347-ladsgroup.json
- 23:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48949 and previous config saved to /var/cache/conftool/dbconfig/20230606-230347-ladsgroup.json
- 22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48948 and previous config saved to /var/cache/conftool/dbconfig/20230606-225841-ladsgroup.json
- 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 22:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 22:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48947 and previous config saved to /var/cache/conftool/dbconfig/20230606-224841-ladsgroup.json
- 22:48 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48946 and previous config saved to /var/cache/conftool/dbconfig/20230606-224334-ladsgroup.json
- 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48945 and previous config saved to /var/cache/conftool/dbconfig/20230606-223335-ladsgroup.json
- 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48944 and previous config saved to /var/cache/conftool/dbconfig/20230606-223011-ladsgroup.json
- 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48943 and previous config saved to /var/cache/conftool/dbconfig/20230606-222950-ladsgroup.json
- 22:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
- 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48942 and previous config saved to /var/cache/conftool/dbconfig/20230606-222828-ladsgroup.json
- 22:27 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp everywhere (T299954) (duration: 07m 33s)
- 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48941 and previous config saved to /var/cache/conftool/dbconfig/20230606-222534-ladsgroup.json
- 22:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 22:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48940 and previous config saved to /var/cache/conftool/dbconfig/20230606-222513-ladsgroup.json
- 22:21 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp everywhere (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 22:19 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp everywhere (T299954)
- 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48939 and previous config saved to /var/cache/conftool/dbconfig/20230606-221444-ladsgroup.json
- 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48938 and previous config saved to /var/cache/conftool/dbconfig/20230606-221007-ladsgroup.json
- 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48937 and previous config saved to /var/cache/conftool/dbconfig/20230606-215938-ladsgroup.json
- 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48936 and previous config saved to /var/cache/conftool/dbconfig/20230606-215501-ladsgroup.json
- 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48935 and previous config saved to /var/cache/conftool/dbconfig/20230606-214432-ladsgroup.json
- 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48934 and previous config saved to /var/cache/conftool/dbconfig/20230606-214109-ladsgroup.json
- 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 21:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 21:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48933 and previous config saved to /var/cache/conftool/dbconfig/20230606-214048-ladsgroup.json
- 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48932 and previous config saved to /var/cache/conftool/dbconfig/20230606-213954-ladsgroup.json
- 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48931 and previous config saved to /var/cache/conftool/dbconfig/20230606-213702-ladsgroup.json
- 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48930 and previous config saved to /var/cache/conftool/dbconfig/20230606-213641-ladsgroup.json
- 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48929 and previous config saved to /var/cache/conftool/dbconfig/20230606-212542-ladsgroup.json
- 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48928 and previous config saved to /var/cache/conftool/dbconfig/20230606-212135-ladsgroup.json
- 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48927 and previous config saved to /var/cache/conftool/dbconfig/20230606-211036-ladsgroup.json
- 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48926 and previous config saved to /var/cache/conftool/dbconfig/20230606-210629-ladsgroup.json
- 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
- 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
- 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48925 and previous config saved to /var/cache/conftool/dbconfig/20230606-205530-ladsgroup.json
- 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48924 and previous config saved to /var/cache/conftool/dbconfig/20230606-205206-ladsgroup.json
- 20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 20:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48923 and previous config saved to /var/cache/conftool/dbconfig/20230606-205123-ladsgroup.json
- 20:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 20:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48922 and previous config saved to /var/cache/conftool/dbconfig/20230606-205002-ladsgroup.json
- 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48921 and previous config saved to /var/cache/conftool/dbconfig/20230606-204527-ladsgroup.json
- 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48920 and previous config saved to /var/cache/conftool/dbconfig/20230606-204506-ladsgroup.json
- 20:41 urbanecm@deploy1002: Finished scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) (duration: 07m 23s)
- 20:35 urbanecm@deploy1002: urbanecm: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48919 and previous config saved to /var/cache/conftool/dbconfig/20230606-203456-ladsgroup.json
- 20:34 urbanecm@deploy1002: Started scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078)
- 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48917 and previous config saved to /var/cache/conftool/dbconfig/20230606-203000-ladsgroup.json
- 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48916 and previous config saved to /var/cache/conftool/dbconfig/20230606-201950-ladsgroup.json
- 20:16 mutante: miscweb1003, miscweb2003 - rm -rf /srv/org/wikimedia/sitemaps after removing httpd virtual host T338064
- 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48915 and previous config saved to /var/cache/conftool/dbconfig/20230606-201454-ladsgroup.json
- 20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
- 20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
- 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48914 and previous config saved to /var/cache/conftool/dbconfig/20230606-200444-ladsgroup.json
- 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48913 and previous config saved to /var/cache/conftool/dbconfig/20230606-195948-ladsgroup.json
- 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48912 and previous config saved to /var/cache/conftool/dbconfig/20230606-195557-ladsgroup.json
- 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48911 and previous config saved to /var/cache/conftool/dbconfig/20230606-195320-ladsgroup.json
- 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48910 and previous config saved to /var/cache/conftool/dbconfig/20230606-193814-ladsgroup.json
- 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48909 and previous config saved to /var/cache/conftool/dbconfig/20230606-192308-ladsgroup.json
- 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48908 and previous config saved to /var/cache/conftool/dbconfig/20230606-190802-ladsgroup.json
- 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48907 and previous config saved to /var/cache/conftool/dbconfig/20230606-190420-ladsgroup.json
- 19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48906 and previous config saved to /var/cache/conftool/dbconfig/20230606-190402-ladsgroup.json
- 19:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 19:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 18:10 mutante: disabling https://sitemaps.wikimedia.org - T338064 T332101
- 18:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.12 refs T337526
- 18:01 sukhe: cumin 'A:cp-text' 'enable-puppet "CR 926611" && run-puppet-agent -q'
- 18:01 sukhe: re-enable puppet on A:cp-text and force puppet run: T338064
- 17:54 sukhe: enable puppet on cp4037 to test CR 926611
- 17:50 sukhe: disable puppet on A:cp-text to roll out CR 926611
- 17:39 sukhe: sudo cumin 'P:ntp' 'enable-puppet "testing CR 926598" && run-puppet-agent'
- 17:27 sukhe: sudo cumin 'P:ntp' 'disable-puppet "testing CR 926598"'
- 17:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 17:04 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 17:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 17:01 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 16:51 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 16:41 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 16:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:40 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:39 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:37 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:30 sukhe: low-traffic/codfw: set routing-options static route 10.2.1.0/24 next-hop 10.192.32.14
- 16:27 sukhe: restart pybal on lvs2013 to remove bgp-med override
- 16:23 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 16:12 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 16:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 16:06 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 16:03 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48904 and previous config saved to /var/cache/conftool/dbconfig/20230606-160151-ladsgroup.json
- 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
- 15:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:52 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
- 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48902 and previous config saved to /var/cache/conftool/dbconfig/20230606-154645-ladsgroup.json
- 15:46 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:40 cdanis@deploy1002: Finished scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" (duration: 08m 13s)
- 15:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:37 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:35 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:34 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:34 cdanis@deploy1002: cdanis and otto: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 15:32 zabe: purge wikimaniawiki logos # T337044
- 15:32 cdanis@deploy1002: Started scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion"
- 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48901 and previous config saved to /var/cache/conftool/dbconfig/20230606-153139-ladsgroup.json
- 15:30 zabe@deploy1002: Finished scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) (duration: 08m 02s)
- 15:26 sukhe: homer "cr*-codfw*" commit "Gerrit: 927725 add new LVS host lvs2013" : T326767
- 15:24 zabe@deploy1002: robertsky and zabe: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 15:22 zabe@deploy1002: Started scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044)
- 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2013
- 15:21 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2013
- 15:20 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48900 and previous config saved to /var/cache/conftool/dbconfig/20230606-151633-ladsgroup.json
- 15:12 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_esams and A:cp
- 15:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
- 15:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 15:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics@72d9b87]: (no justification provided) (duration: 00m 10s)
- 15:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@72d9b87]: (no justification provided)
- 15:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 15:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 15:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48899 and previous config saved to /var/cache/conftool/dbconfig/20230606-150141-ladsgroup.json
- 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48898 and previous config saved to /var/cache/conftool/dbconfig/20230606-150120-ladsgroup.json
- 15:00 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
- 14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
- 14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
- 14:56 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
- 14:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
- 14:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
- 14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48897 and previous config saved to /var/cache/conftool/dbconfig/20230606-144614-ladsgroup.json
- 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
- 14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
- 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48896 and previous config saved to /var/cache/conftool/dbconfig/20230606-143107-ladsgroup.json
- 14:25 oblivian@deploy1002: Finished scap: Backport for Load and enable parsoid everywhere (T334980) (duration: 15m 00s)
- 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48895 and previous config saved to /var/cache/conftool/dbconfig/20230606-141601-ladsgroup.json
- 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
- 14:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 14:12 oblivian@deploy1002: oblivian: Backport for Load and enable parsoid everywhere (T334980) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:10 oblivian@deploy1002: Started scap: Backport for Load and enable parsoid everywhere (T334980)
- 14:08 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 14:06 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
- 14:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
- 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
- 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
- 14:01 oblivian@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) (duration: 07m 57s)
- 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48894 and previous config saved to /var/cache/conftool/dbconfig/20230606-140051-ladsgroup.json
- 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48893 and previous config saved to /var/cache/conftool/dbconfig/20230606-140030-ladsgroup.json
- 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 780 hosts
- 13:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 780 hosts
- 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1259 hosts
- 13:57 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1259 hosts
- 13:55 oblivian@deploy1002: oblivian and daniel: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 13:53 oblivian@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366)
- 13:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
- 13:50 oblivian@deploy1002: Finished scap: Backport for Drop wmgMemoryLimitParsoid from IS.php (duration: 07m 21s)
- 13:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
- 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48891 and previous config saved to /var/cache/conftool/dbconfig/20230606-134524-ladsgroup.json
- 13:45 oblivian@deploy1002: oblivian: Backport for Drop wmgMemoryLimitParsoid from IS.php synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 13:43 oblivian@deploy1002: Started scap: Backport for Drop wmgMemoryLimitParsoid from IS.php
- 13:41 oblivian@deploy1002: Finished scap: Backport for Raise memory limit to match parsoid (T334980) (duration: 07m 53s)
- 13:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 13:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 13:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
- 13:35 oblivian@deploy1002: oblivian: Backport for Raise memory limit to match parsoid (T334980) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
- 13:33 oblivian@deploy1002: Started scap: Backport for Raise memory limit to match parsoid (T334980)
- 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48890 and previous config saved to /var/cache/conftool/dbconfig/20230606-133018-ladsgroup.json
- 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48889 and previous config saved to /var/cache/conftool/dbconfig/20230606-131512-ladsgroup.json
- 13:11 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 13:06 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Disable canary events and hadoop ingestion for development.network.probe - T332024 (duration: 07m 17s)
- 13:00 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48888 and previous config saved to /var/cache/conftool/dbconfig/20230606-125944-ladsgroup.json
- 12:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 12:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48887 and previous config saved to /var/cache/conftool/dbconfig/20230606-125923-ladsgroup.json
- 12:56 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_esams and A:cp
- 12:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
- 12:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
- 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48886 and previous config saved to /var/cache/conftool/dbconfig/20230606-124417-ladsgroup.json
- 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48885 and previous config saved to /var/cache/conftool/dbconfig/20230606-122911-ladsgroup.json
- 12:21 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 10s)
- 12:19 cgoubert@deploy1002: Started scap: (no justification provided)
- 12:19 claime: redeploying 927218 to mw-on-k8s - T338121
- 12:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48884 and previous config saved to /var/cache/conftool/dbconfig/20230606-121405-ladsgroup.json
- 12:09 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 12:00 kamila@deploy1002: Finished scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) (duration: 08m 54s)
- 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48881 and previous config saved to /var/cache/conftool/dbconfig/20230606-115911-ladsgroup.json
- 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48880 and previous config saved to /var/cache/conftool/dbconfig/20230606-115833-ladsgroup.json
- 11:53 kamila@deploy1002: kamila and klausman: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 11:51 kamila@deploy1002: Started scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)
- 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48879 and previous config saved to /var/cache/conftool/dbconfig/20230606-114327-ladsgroup.json
- 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48878 and previous config saved to /var/cache/conftool/dbconfig/20230606-112819-ladsgroup.json
- 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48877 and previous config saved to /var/cache/conftool/dbconfig/20230606-111313-ladsgroup.json
- 11:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
- 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48876 and previous config saved to /var/cache/conftool/dbconfig/20230606-105756-ladsgroup.json
- 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48875 and previous config saved to /var/cache/conftool/dbconfig/20230606-105724-ladsgroup.json
- 10:53 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 10:53 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 10:52 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 10:51 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) (duration: 07m 03s)
- 10:51 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 10:50 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 10:50 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 10:50 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 10:50 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 10:46 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 10:44 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954)
- 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48874 and previous config saved to /var/cache/conftool/dbconfig/20230606-104218-ladsgroup.json
- 10:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 10:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 10:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48873 and previous config saved to /var/cache/conftool/dbconfig/20230606-102712-ladsgroup.json
- 10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:20 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.10 (duration: 02m 18s)
- 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:17 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.12 refs T337526 (duration: 56m 25s)
- 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48872 and previous config saved to /var/cache/conftool/dbconfig/20230606-101205-ladsgroup.json
- 10:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:02 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 10:01 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 10:00 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 09:59 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 09:58 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 09:58 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48871 and previous config saved to /var/cache/conftool/dbconfig/20230606-095512-ladsgroup.json
- 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48870 and previous config saved to /var/cache/conftool/dbconfig/20230606-095451-ladsgroup.json
- 09:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 09:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48869 and previous config saved to /var/cache/conftool/dbconfig/20230606-093945-ladsgroup.json
- 09:34 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
- 09:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_esams and A:cp
- 09:27 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
- 09:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 09:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48867 and previous config saved to /var/cache/conftool/dbconfig/20230606-092439-ladsgroup.json
- 09:21 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.12 refs T337526
- 09:18 jynus: running systemctl start train-presync
- 09:16 vgutierrez: restarting acme-chief and nginx on acme-chief instances
- 09:11 claime: Building production images - T338014
- 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48866 and previous config saved to /var/cache/conftool/dbconfig/20230606-090933-ladsgroup.json
- 08:59 urbanecm: deploy1002: run /usr/local/sbin/fix-staging-perms (T338205)
- 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
- 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
- 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48865 and previous config saved to /var/cache/conftool/dbconfig/20230606-085337-ladsgroup.json
- 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48864 and previous config saved to /var/cache/conftool/dbconfig/20230606-085317-ladsgroup.json
- 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
- 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
- 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48863 and previous config saved to /var/cache/conftool/dbconfig/20230606-083810-ladsgroup.json
- 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48861 and previous config saved to /var/cache/conftool/dbconfig/20230606-082304-ladsgroup.json
- 08:15 moritzm: installing openssl security updates on bullseye
- 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48860 and previous config saved to /var/cache/conftool/dbconfig/20230606-080758-ladsgroup.json
- 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48859 and previous config saved to /var/cache/conftool/dbconfig/20230606-075210-ladsgroup.json
- 07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
- 07:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
- 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48858 and previous config saved to /var/cache/conftool/dbconfig/20230606-075149-ladsgroup.json
- 07:47 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_esams and A:cp
- 07:42 dcausse@deploy1002: Finished scap: Backport for ttm: use new config option to separate readable and writable services (T322284) (duration: 15m 20s)
- 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48857 and previous config saved to /var/cache/conftool/dbconfig/20230606-073643-ladsgroup.json
- 07:28 dcausse@deploy1002: dcausse: Backport for ttm: use new config option to separate readable and writable services (T322284) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 07:27 dcausse@deploy1002: Started scap: Backport for ttm: use new config option to separate readable and writable services (T322284)
- 07:22 kharlan@deploy1002: Finished scap: Backport for checkuser: Disable client hints feature by default (T337944) (duration: 08m 14s)
- 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48856 and previous config saved to /var/cache/conftool/dbconfig/20230606-072137-ladsgroup.json
- 07:16 kharlan@deploy1002: kharlan: Backport for checkuser: Disable client hints feature by default (T337944) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 07:14 kharlan@deploy1002: Started scap: Backport for checkuser: Disable client hints feature by default (T337944)
- 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48855 and previous config saved to /var/cache/conftool/dbconfig/20230606-070631-ladsgroup.json
- 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48854 and previous config saved to /var/cache/conftool/dbconfig/20230606-065057-ladsgroup.json
- 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
- 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
- 06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
- 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
- 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
- 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
- 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48853 and previous config saved to /var/cache/conftool/dbconfig/20230606-060807-ladsgroup.json
- 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48852 and previous config saved to /var/cache/conftool/dbconfig/20230606-055301-ladsgroup.json
- 05:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
- 05:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
- 05:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 2518
- 05:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
- 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48851 and previous config saved to /var/cache/conftool/dbconfig/20230606-053755-ladsgroup.json
- 05:34 Amir1: ladsgroup@clouddb1021:/srv/sqldata.s1$ sudo rm db1196* (T337961)
- 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48850 and previous config saved to /var/cache/conftool/dbconfig/20230606-052249-ladsgroup.json
- 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48849 and previous config saved to /var/cache/conftool/dbconfig/20230606-051938-ladsgroup.json
- 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48848 and previous config saved to /var/cache/conftool/dbconfig/20230606-051918-ladsgroup.json
- 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48847 and previous config saved to /var/cache/conftool/dbconfig/20230606-050410-ladsgroup.json
- 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48846 and previous config saved to /var/cache/conftool/dbconfig/20230606-044904-ladsgroup.json
- 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48845 and previous config saved to /var/cache/conftool/dbconfig/20230606-043358-ladsgroup.json
- 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48844 and previous config saved to /var/cache/conftool/dbconfig/20230606-043047-ladsgroup.json
- 04:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48843 and previous config saved to /var/cache/conftool/dbconfig/20230606-043026-ladsgroup.json
- 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48842 and previous config saved to /var/cache/conftool/dbconfig/20230606-041520-ladsgroup.json
- 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48841 and previous config saved to /var/cache/conftool/dbconfig/20230606-040013-ladsgroup.json
- 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48840 and previous config saved to /var/cache/conftool/dbconfig/20230606-034506-ladsgroup.json
- 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48839 and previous config saved to /var/cache/conftool/dbconfig/20230606-034256-ladsgroup.json
- 03:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 03:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48838 and previous config saved to /var/cache/conftool/dbconfig/20230606-034235-ladsgroup.json
- 03:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
- 03:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
- 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48837 and previous config saved to /var/cache/conftool/dbconfig/20230606-032729-ladsgroup.json
- 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48836 and previous config saved to /var/cache/conftool/dbconfig/20230606-031223-ladsgroup.json
- 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48835 and previous config saved to /var/cache/conftool/dbconfig/20230606-025717-ladsgroup.json
- 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48834 and previous config saved to /var/cache/conftool/dbconfig/20230606-025507-ladsgroup.json
- 02:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48833 and previous config saved to /var/cache/conftool/dbconfig/20230606-021622-ladsgroup.json
- 02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48832 and previous config saved to /var/cache/conftool/dbconfig/20230606-020616-ladsgroup.json
- 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48831 and previous config saved to /var/cache/conftool/dbconfig/20230606-020116-ladsgroup.json
- 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48830 and previous config saved to /var/cache/conftool/dbconfig/20230606-015110-ladsgroup.json
- 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48829 and previous config saved to /var/cache/conftool/dbconfig/20230606-014610-ladsgroup.json
- 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48828 and previous config saved to /var/cache/conftool/dbconfig/20230606-013604-ladsgroup.json
- 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48827 and previous config saved to /var/cache/conftool/dbconfig/20230606-013104-ladsgroup.json
- 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48826 and previous config saved to /var/cache/conftool/dbconfig/20230606-012058-ladsgroup.json
- 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48825 and previous config saved to /var/cache/conftool/dbconfig/20230606-010704-ladsgroup.json
- 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48824 and previous config saved to /var/cache/conftool/dbconfig/20230606-010643-ladsgroup.json
- 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48823 and previous config saved to /var/cache/conftool/dbconfig/20230606-005357-ladsgroup.json
- 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48822 and previous config saved to /var/cache/conftool/dbconfig/20230606-005336-ladsgroup.json
- 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48821 and previous config saved to /var/cache/conftool/dbconfig/20230606-005137-ladsgroup.json
- 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48820 and previous config saved to /var/cache/conftool/dbconfig/20230606-003830-ladsgroup.json
- 00:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48819 and previous config saved to /var/cache/conftool/dbconfig/20230606-003631-ladsgroup.json
- 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48818 and previous config saved to /var/cache/conftool/dbconfig/20230606-002324-ladsgroup.json
- 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48817 and previous config saved to /var/cache/conftool/dbconfig/20230606-002125-ladsgroup.json
- 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48816 and previous config saved to /var/cache/conftool/dbconfig/20230606-001914-ladsgroup.json
- 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48815 and previous config saved to /var/cache/conftool/dbconfig/20230606-001836-ladsgroup.json
- 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48814 and previous config saved to /var/cache/conftool/dbconfig/20230606-000818-ladsgroup.json
- 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48813 and previous config saved to /var/cache/conftool/dbconfig/20230606-000330-ladsgroup.json
2023-06-05
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48812 and previous config saved to /var/cache/conftool/dbconfig/20230605-235346-ladsgroup.json
- 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48811 and previous config saved to /var/cache/conftool/dbconfig/20230605-235310-ladsgroup.json
- 23:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) (duration: 07m 02s)
- 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48810 and previous config saved to /var/cache/conftool/dbconfig/20230605-234824-ladsgroup.json
- 23:43 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 23:42 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954)
- 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48809 and previous config saved to /var/cache/conftool/dbconfig/20230605-233804-ladsgroup.json
- 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48808 and previous config saved to /var/cache/conftool/dbconfig/20230605-233318-ladsgroup.json
- 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48807 and previous config saved to /var/cache/conftool/dbconfig/20230605-233107-ladsgroup.json
- 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48806 and previous config saved to /var/cache/conftool/dbconfig/20230605-233046-ladsgroup.json
- 23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 23:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48805 and previous config saved to /var/cache/conftool/dbconfig/20230605-232258-ladsgroup.json
- 23:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 23:22 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48804 and previous config saved to /var/cache/conftool/dbconfig/20230605-231540-ladsgroup.json
- 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
- 23:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
- 23:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 23:11 jforrester@deploy1002: Finished deploy [integration/docroot@6eefe56]: I5c1b92 for T334492 (duration: 00m 05s)
- 23:10 jforrester@deploy1002: Started deploy [integration/docroot@6eefe56]: I5c1b92 for T334492
- 23:09 jforrester@deploy1002: Finished deploy [integration/docroot@ab77611]: Idf6c7a (duration: 00m 08s)
- 23:09 jforrester@deploy1002: Started deploy [integration/docroot@ab77611]: Idf6c7a
- 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48803 and previous config saved to /var/cache/conftool/dbconfig/20230605-230752-ladsgroup.json
- 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48802 and previous config saved to /var/cache/conftool/dbconfig/20230605-230034-ladsgroup.json
- 22:57 mutante: contint2001 - sudo systemctl restart apache2
- 22:57 mutante: contint2001 - sudo apt-get remove --purge libapache2-mod-php7.3 php7.3-cli php7.3-common php7.3-json php7.3-opcache php7.3-readline
- 22:55 jforrester@deploy1002: Finished deploy [integration/docroot@8255d99]: I6c7575 for T337425 (duration: 00m 13s)
- 22:55 jforrester@deploy1002: Started deploy [integration/docroot@8255d99]: I6c7575 for T337425
- 22:53 mutante: contint2001 (prod main CI server) - upgrading PHP 7.3 to 7.4
- 22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954) (duration: 09m 13s)
- 22:46 mutante: contint2002, contint1002 - upgrading PHP from 7.3 to 7.4
- 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48801 and previous config saved to /var/cache/conftool/dbconfig/20230605-224528-ladsgroup.json
- 22:41 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in testwiki (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 22:40 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954)
- 22:37 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) (duration: 09m 04s)
- 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48800 and previous config saved to /var/cache/conftool/dbconfig/20230605-223035-ladsgroup.json
- 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 22:29 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 22:28 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700)
- 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48799 and previous config saved to /var/cache/conftool/dbconfig/20230605-222745-ladsgroup.json
- 22:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
- 22:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
- 22:24 ladsgroup@deploy1002: Finished scap: Backport for Revert "Remove legacy encoding option from dawiktionary" (duration: 07m 40s)
- 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48798 and previous config saved to /var/cache/conftool/dbconfig/20230605-222339-ladsgroup.json
- 22:18 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Remove legacy encoding option from dawiktionary" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 22:17 ladsgroup@deploy1002: Started scap: Backport for Revert "Remove legacy encoding option from dawiktionary"
- 22:13 ladsgroup@deploy1002: Finished scap: Backport for Help measure the impact of saneitizer jobs (T336698) (duration: 09m 48s)
- 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48797 and previous config saved to /var/cache/conftool/dbconfig/20230605-220833-ladsgroup.json
- 22:05 ladsgroup@deploy1002: ladsgroup: Backport for Help measure the impact of saneitizer jobs (T336698) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 22:03 ladsgroup@deploy1002: Started scap: Backport for Help measure the impact of saneitizer jobs (T336698)
- 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1016.eqiad.wmnet
- 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1016.eqiad.wmnet
- 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48796 and previous config saved to /var/cache/conftool/dbconfig/20230605-215345-ladsgroup.json
- 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48795 and previous config saved to /var/cache/conftool/dbconfig/20230605-215326-ladsgroup.json
- 21:51 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1016.eqiad.wmnet
- 21:50 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1016.eqiad.wmnet
- 21:42 urbanecm@deploy1002: Finished scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) (duration: 25m 38s)
- 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48794 and previous config saved to /var/cache/conftool/dbconfig/20230605-213839-ladsgroup.json
- 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48793 and previous config saved to /var/cache/conftool/dbconfig/20230605-213819-ladsgroup.json
- 21:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1015.eqiad.wmnet
- 21:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
- 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48792 and previous config saved to /var/cache/conftool/dbconfig/20230605-213202-ladsgroup.json
- 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 21:30 urbanecm@deploy1002: urbanecm: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 21:29 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 21:29 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 21:25 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
- 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48791 and previous config saved to /var/cache/conftool/dbconfig/20230605-212333-ladsgroup.json
- 21:23 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1015.eqiad.wmnet
- 21:18 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 21:17 urbanecm@deploy1002: Started scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085)
- 21:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (T338093) (duration: 24m 34s)
- 21:15 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48790 and previous config saved to /var/cache/conftool/dbconfig/20230605-210827-ladsgroup.json
- 21:05 urbanecm@deploy1002: urbanecm: Backport for Update interwiki cache (T338093) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 20:51 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache (T338093)
- 20:48 cjming: end of UTC late backport window
- 20:47 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # verify T338180 fix
- away: payments-wiki upgraded from 2b4203df to f3b229c6
- 20:46 cjming@deploy1002: Finished scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" (duration: 09m 57s)
- 20:38 cjming@deploy1002: cjming: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 20:36 cjming@deploy1002: Started scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere""
- 20:35 cjming@deploy1002: Finished scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) (duration: 24m 57s)
- 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48789 and previous config saved to /var/cache/conftool/dbconfig/20230605-202916-ladsgroup.json
- 20:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48788 and previous config saved to /var/cache/conftool/dbconfig/20230605-202855-ladsgroup.json
- 20:23 cjming@deploy1002: cjming: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48787 and previous config saved to /var/cache/conftool/dbconfig/20230605-201349-ladsgroup.json
- 20:10 cjming@deploy1002: Started scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355)
- 20:09 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # attempt to fix permission errors when doing a backport
- 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48786 and previous config saved to /var/cache/conftool/dbconfig/20230605-195842-ladsgroup.json
- 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48785 and previous config saved to /var/cache/conftool/dbconfig/20230605-194336-ladsgroup.json
- 19:32 brett: Maglev LVS scheduler rollout in eqiad finished (puppet re-enabled) - T263797
- 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2011.codfw.wmnet
- 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
- 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48784 and previous config saved to /var/cache/conftool/dbconfig/20230605-190702-ladsgroup.json
- 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48783 and previous config saved to /var/cache/conftool/dbconfig/20230605-190528-ladsgroup.json
- 19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 19:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 19:03 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
- 18:58 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2011.codfw.wmnet
- 18:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
- 18:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: revert - remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 11m 54s)
- 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48782 and previous config saved to /var/cache/conftool/dbconfig/20230605-185156-ladsgroup.json
- 18:48 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
- 18:48 inflatador: bking@cumin1001 depooling wdqs2011for fw update T331297
- 18:48 inflatador: bking@cumin1001 repooling wdqs2010 T331297
- 18:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
- 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48781 and previous config saved to /var/cache/conftool/dbconfig/20230605-183650-ladsgroup.json
- 18:35 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
- 18:32 inflatador: bking@cumin1001 depooling wdqs2010 for fw update T331297
- 18:30 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: revert - Remove unused page_change rc streams - T336817 (duration: 11m 23s)
- 18:29 sukhe: homer "cr*-eqiad*" commit "Gerrit: 927246 remove old gerrit service IP"
- 18:28 brett: Maglev LVS scheduler rollout in eqiad (puppet disabled) - T263797
- 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48780 and previous config saved to /var/cache/conftool/dbconfig/20230605-182144-ladsgroup.json
- 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48779 and previous config saved to /var/cache/conftool/dbconfig/20230605-181935-ladsgroup.json
- 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48778 and previous config saved to /var/cache/conftool/dbconfig/20230605-181915-ladsgroup.json
- 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 18:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48777 and previous config saved to /var/cache/conftool/dbconfig/20230605-181219-ladsgroup.json
- 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48776 and previous config saved to /var/cache/conftool/dbconfig/20230605-180408-ladsgroup.json
- 17:58 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
- 17:58 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
- 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48775 and previous config saved to /var/cache/conftool/dbconfig/20230605-175712-ladsgroup.json
- 17:50 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: no-op: Remove unused page_change rc streams - T336817 (duration: 20m 11s)
- 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48774 and previous config saved to /var/cache/conftool/dbconfig/20230605-174902-ladsgroup.json
- 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48773 and previous config saved to /var/cache/conftool/dbconfig/20230605-174206-ladsgroup.json
- 17:38 cdanis@deploy1002: Finished scap: Backport for Enable user network probe events (T332024) (duration: 10m 02s)
- 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48772 and previous config saved to /var/cache/conftool/dbconfig/20230605-173356-ladsgroup.json
- 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48771 and previous config saved to /var/cache/conftool/dbconfig/20230605-173002-ladsgroup.json
- 17:30 cdanis@deploy1002: cdanis: Backport for Enable user network probe events (T332024) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 17:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 17:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48770 and previous config saved to /var/cache/conftool/dbconfig/20230605-172942-ladsgroup.json
- 17:28 cdanis@deploy1002: Started scap: Backport for Enable user network probe events (T332024)
- 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48769 and previous config saved to /var/cache/conftool/dbconfig/20230605-172700-ladsgroup.json
- 17:26 cdanis@deploy1002: Backport cancelled.
- 17:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 09m 25s)
- 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48768 and previous config saved to /var/cache/conftool/dbconfig/20230605-172124-ladsgroup.json
- 17:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 17:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48767 and previous config saved to /var/cache/conftool/dbconfig/20230605-172103-ladsgroup.json
- 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48766 and previous config saved to /var/cache/conftool/dbconfig/20230605-171436-ladsgroup.json
- 17:12 cdanis@deploy1002: Backport cancelled.
- 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48765 and previous config saved to /var/cache/conftool/dbconfig/20230605-170557-ladsgroup.json
- 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48764 and previous config saved to /var/cache/conftool/dbconfig/20230605-165929-ladsgroup.json
- 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48763 and previous config saved to /var/cache/conftool/dbconfig/20230605-165051-ladsgroup.json
- 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48762 and previous config saved to /var/cache/conftool/dbconfig/20230605-164423-ladsgroup.json
- 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
- 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48761 and previous config saved to /var/cache/conftool/dbconfig/20230605-163714-ladsgroup.json
- 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48760 and previous config saved to /var/cache/conftool/dbconfig/20230605-163653-ladsgroup.json
- 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48759 and previous config saved to /var/cache/conftool/dbconfig/20230605-163545-ladsgroup.json
- 16:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48758 and previous config saved to /var/cache/conftool/dbconfig/20230605-162707-ladsgroup.json
- 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48757 and previous config saved to /var/cache/conftool/dbconfig/20230605-162629-ladsgroup.json
- 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48756 and previous config saved to /var/cache/conftool/dbconfig/20230605-162147-ladsgroup.json
- 16:21 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
- 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
- 16:19 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
- 16:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
- 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48755 and previous config saved to /var/cache/conftool/dbconfig/20230605-161123-ladsgroup.json
- 16:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48754 and previous config saved to /var/cache/conftool/dbconfig/20230605-160640-ladsgroup.json
- 16:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 16:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:59 bblack: mw1419: manually executing a php restart to test new safe-service-restart
- 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48753 and previous config saved to /var/cache/conftool/dbconfig/20230605-155617-ladsgroup.json
- 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
- 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48752 and previous config saved to /var/cache/conftool/dbconfig/20230605-155134-ladsgroup.json
- 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2013']
- 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48751 and previous config saved to /var/cache/conftool/dbconfig/20230605-154926-ladsgroup.json
- 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48750 and previous config saved to /var/cache/conftool/dbconfig/20230605-154905-ladsgroup.json
- 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48749 and previous config saved to /var/cache/conftool/dbconfig/20230605-154110-ladsgroup.json
- 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
- 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2013']
- 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
- 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48748 and previous config saved to /var/cache/conftool/dbconfig/20230605-153542-ladsgroup.json
- 15:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 15:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48747 and previous config saved to /var/cache/conftool/dbconfig/20230605-153521-ladsgroup.json
- 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48746 and previous config saved to /var/cache/conftool/dbconfig/20230605-153359-ladsgroup.json
- 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
- 15:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
- 15:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
- 15:27 Amir1: on s3 master: update `text` set old_text = 'O:18:"historyblobcurstub":1:{s:6:"mCurId";i:5532;}', old_flags = 'object' where old_id= 14484; (T337700)
- 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48745 and previous config saved to /var/cache/conftool/dbconfig/20230605-152015-ladsgroup.json
- 15:19 moritzm: installing debian-archive-keyring updates on bullseye hosts
- 15:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@674ec0a]: (no justification provided) (duration: 00m 17s)
- 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48744 and previous config saved to /var/cache/conftool/dbconfig/20230605-151853-ladsgroup.json
- 15:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@674ec0a]: (no justification provided)
- 15:18 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767 (duration: 102m 46s)
- 15:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
- 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
- 15:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
- 15:05 moritzm: installing avahi security updates
- 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48742 and previous config saved to /var/cache/conftool/dbconfig/20230605-150509-ladsgroup.json
- 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48741 and previous config saved to /var/cache/conftool/dbconfig/20230605-150347-ladsgroup.json
- 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48740 and previous config saved to /var/cache/conftool/dbconfig/20230605-150138-ladsgroup.json
- 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48739 and previous config saved to /var/cache/conftool/dbconfig/20230605-150117-ladsgroup.json
- 14:55 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 14:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 14:52 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 14:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 14:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 14:50 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48738 and previous config saved to /var/cache/conftool/dbconfig/20230605-145003-ladsgroup.json
- 14:48 sukhe: homer "cr*-codfw*" commit "Gerrit: 927208 remove decommissioned host lvs2009": T335777
- 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2009.codfw.wmnet
- 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
- 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48737 and previous config saved to /var/cache/conftool/dbconfig/20230605-144611-ladsgroup.json
- 14:45 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
- 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48736 and previous config saved to /var/cache/conftool/dbconfig/20230605-144438-ladsgroup.json
- 14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48735 and previous config saved to /var/cache/conftool/dbconfig/20230605-144417-ladsgroup.json
- 14:42 sukhe@cumin2002: START - Cookbook sre.dns.netbox
- 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2009.codfw.wmnet
- 14:31 ejegg: payments-wiki upgraded from c2f9f8b5 to 2b4203df
- 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48734 and previous config saved to /var/cache/conftool/dbconfig/20230605-143105-ladsgroup.json
- 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48733 and previous config saved to /var/cache/conftool/dbconfig/20230605-142911-ladsgroup.json
- 14:28 sukhe: codfw low-traffic LVS: set routing-options static route 10.2.1.0/24 next-hop 10.192.49.7
- 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48732 and previous config saved to /var/cache/conftool/dbconfig/20230605-141559-ladsgroup.json
- 14:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 14:15 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48731 and previous config saved to /var/cache/conftool/dbconfig/20230605-141451-ladsgroup.json
- 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48730 and previous config saved to /var/cache/conftool/dbconfig/20230605-141430-ladsgroup.json
- 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48729 and previous config saved to /var/cache/conftool/dbconfig/20230605-141405-ladsgroup.json
- 14:08 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 14:08 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48728 and previous config saved to /var/cache/conftool/dbconfig/20230605-135924-ladsgroup.json
- 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48727 and previous config saved to /var/cache/conftool/dbconfig/20230605-135859-ladsgroup.json
- 13:57 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 13:56 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48726 and previous config saved to /var/cache/conftool/dbconfig/20230605-135332-ladsgroup.json
- 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48725 and previous config saved to /var/cache/conftool/dbconfig/20230605-135311-ladsgroup.json
- 13:46 moritzm: installing python-ipaddress security updates
- 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
- 13:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
- 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48724 and previous config saved to /var/cache/conftool/dbconfig/20230605-134418-ladsgroup.json
- 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
- 13:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
- 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48723 and previous config saved to /var/cache/conftool/dbconfig/20230605-134313-ladsgroup.json
- 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48722 and previous config saved to /var/cache/conftool/dbconfig/20230605-133805-ladsgroup.json
- 13:36 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767
- 13:35 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937 (duration: 01m 06s)
- 13:35 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
- 13:35 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs (duration: 05m 54s)
- 13:32 bblack: lvs1* (eqiad) - restart pybal for T334703 IPs
- 13:29 bblack: lvs2* (codfw) - restart pybal for T334703 IPs
- 13:29 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs
- 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48721 and previous config saved to /var/cache/conftool/dbconfig/20230605-132911-ladsgroup.json
- 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48720 and previous config saved to /var/cache/conftool/dbconfig/20230605-132807-ladsgroup.json
- 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48719 and previous config saved to /var/cache/conftool/dbconfig/20230605-132703-ladsgroup.json
- 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48718 and previous config saved to /var/cache/conftool/dbconfig/20230605-132642-ladsgroup.json
- 13:25 hashar: Restarted Zuul due to stall ssh connection # T309376
- 13:25 bblack: lvs3* (esams) - restart pybal for T334703 IPs
- 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48717 and previous config saved to /var/cache/conftool/dbconfig/20230605-132259-ladsgroup.json
- 13:19 bblack: lvs5* (eqsin) - restart pybal for T334703 IPs
- 13:17 Lucas_WMDE: UTC afternoon backport+config window done
- 13:15 bblack: lvs6* (drmrs) - restart pybal for T334703 IPs
- 13:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140) (duration: 10m 06s)
- 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48716 and previous config saved to /var/cache/conftool/dbconfig/20230605-131301-ladsgroup.json
- 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48715 and previous config saved to /var/cache/conftool/dbconfig/20230605-131136-ladsgroup.json
- 13:09 bblack: lvs4* (ulsfo) - restart pybal for T334703 IPs
- 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48714 and previous config saved to /var/cache/conftool/dbconfig/20230605-130753-ladsgroup.json
- 13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make outreachwiki a multilingual Wikidata client (T171140) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 13:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140)
- 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48713 and previous config saved to /var/cache/conftool/dbconfig/20230605-130228-ladsgroup.json
- 13:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48712 and previous config saved to /var/cache/conftool/dbconfig/20230605-125754-ladsgroup.json
- 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48711 and previous config saved to /var/cache/conftool/dbconfig/20230605-125630-ladsgroup.json
- 12:52 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 12:51 Amir1: killed prioritizeFilesWithTemplate.php, stopping depool maint.
- 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 12:44 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48710 and previous config saved to /var/cache/conftool/dbconfig/20230605-124444-ladsgroup.json
- 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
- 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
- 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48709 and previous config saved to /var/cache/conftool/dbconfig/20230605-124124-ladsgroup.json
- 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48708 and previous config saved to /var/cache/conftool/dbconfig/20230605-123915-ladsgroup.json
- 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 12:39 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 12:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 12:17 jynus: creating a copy of db1157 binlogs on dbprov1004 T338128
- 12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
- 12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
- 12:15 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
- 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
- 11:45 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
- 11:39 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
- 11:21 moritzm: restarting Exim on MXes to pick up OpenSSL updates
- 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
- 11:13 moritzm: bounced ferm on ml-serve2006 (race caused by firewall profile change)
- 11:08 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
- 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
- 10:29 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
- 10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
- 10:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
- 10:11 moritzm: installing openssl security updates on Bullseye
- 10:08 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 10:06 godog: truncate xff.log and JobExecutor.log on mwlog1002 to reclaim space - T338127
- 09:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 09:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 09:39 claime: roll-restart thumbor in eqiad - T337649
- 09:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 09:38 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=thumbor.*
- 09:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 09:37 claime: roll-restart thumbor in codfw - T337649
- 08:40 claime: power-cycling restbase1027 - T338122
- 07:54 moritzm: installing containerd security updates
- 07:38 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) (duration: 09m 58s)
- 07:30 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 07:28 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669)
- 07:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 07:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 07:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 07:21 taavi@deploy1002: Finished scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) (duration: 18m 27s)
- 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
- 07:12 taavi@deploy1002: mlitn and taavi: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
- 07:02 taavi@deploy1002: Started scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870)
- 06:20 _joe_: killing a pod with consistently high haproxy queue for thumbor in codfw
- 06:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 60427
- 06:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 60427
2023-06-03
- 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
- 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
- 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
- 13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
2023-06-02
- 20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps'
- 18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc
- 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
- 17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
- 17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups
- 16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed
- 15:35 sukhe: restart ntp.service on dns1002
- 13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479
- 13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release
- 09:35 moritzm: installing texlive-security updates on buster
- 09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) T337836. Reload systemd for unit changes to take effect
- 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet
- 08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet
- 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet
- 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet
- 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet
- 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet
- 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet
- 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet
- 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts:
- 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts:
- 08:42 moritzm: installing traceroute bugfix updates from Bullseye point release
- 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
- 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
- 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org
- 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org
- 07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
- 07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
- 01:53 ejegg: fundraising python tools upgraded from 759d4c89 to 2ca83336
- 01:22 cstone: civicrm upgraded from 3819d6d1 to bcc8fccc
2023-06-01
- 21:06 samtar@deploy1002: Finished scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) (duration: 08m 30s)
- 20:59 samtar@deploy1002: esanders and samtar: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 20:57 samtar@deploy1002: Started scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955)
- 20:54 samtar@deploy1002: Finished scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) (duration: 10m 29s)
- 20:45 samtar@deploy1002: samtar and ksarabia: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 20:44 samtar@deploy1002: Started scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955)
- 20:21 samtar@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on enwiki (T336092) (duration: 07m 56s)
- 20:15 samtar@deploy1002: dani and samtar: Backport for Deploy Research Incentive survey on enwiki (T336092) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 20:13 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive survey on enwiki (T336092)
- 20:12 samtar@deploy1002: Finished scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) (duration: 08m 20s)
- 20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 20:04 samtar@deploy1002: Started scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726)
- 19:51 ejegg: fundraising python tools upgraded from 72570bdd to 759d4c89
- 19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s)
- 19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided)
- 19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s)
- 19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - T334703
- 19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work
- 18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - T334703
- 18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - T334703
- 18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11 refs T337525
- 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s)
- 17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp
- 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided)
- 17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
- 17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
- 17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp
- 17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
- 17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
- 17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
- 17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
- 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
- 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
- 16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting. Recent EventBus changes are not deployed yet? - T336817 (duration: 07m 24s)
- 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
- 16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
- 16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
- 16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - T336817 (duration: 08m 18s)
- 16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - T334703
- 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
- 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
- 16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - T334703
- 16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703 (round 2!)
- 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye
- 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
- 16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
- 16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
- 16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - T337041
- 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
- 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
- 16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
- 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
- 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
- 15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook
- 15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703
- 15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes
- 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye
- 14:59 moritzm: installing python-sqlparse security updates
- 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
- 14:53 moritzm: installing jackson-databind security updates
- 14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook
- 14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
- 14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
- 14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp
- 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp
- 14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 14:29 moritzm: installing imagemagick security updates on buster
- 14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye
- 14:14 fabfur: Disabled puppet on A:cp-drmrs for T323557
- 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s)
- 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided)
- 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json
- 14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - T337505
- 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json
- 13:52 moritzm: installing sysstat security updates
- 13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json
- 13:29 moritzm: installing openssl security updates on bullseye
- 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json
- 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json
- 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json
- 13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - T337505
- 13:13 urbanecm@deploy1002: Finished scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) (duration: 11m 08s)
- 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json
- 13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
- 13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
- 13:02 urbanecm@deploy1002: Started scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)
- 12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json
- 12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json
- 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json
- 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json
- 12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json
- 12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # T336365
- 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json
- 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json
- 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json
- 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
- 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
- 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json
- 11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json
- 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json
- 11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707
- 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json
- 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json
- 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
- 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
- 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json
- 10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json
- 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json
- 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
- 10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
- 10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json
- 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json
- 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 09:56 moritzm: installing systemd security updates on bullseye
- 09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
- 09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G
- 09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev']
- 09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
- 09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev']
- 09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
- 09:32 volans: installed spicerack v7.2.0 on cumin2002
- 09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
- 09:18 godog: remove lv prometheus-global - T288196
- 09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
- 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
- 09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
- 09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
- 09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
- 09:12 volans: installed spicerack v7.2.0 on cumin1001
- 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
- 09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
- 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
- 09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
- 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
- 08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
- 08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
- 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
- 08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
- 08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 08:48 apergos: UTC morning backport and config training window done
- 08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:28 daniel@deploy1002: Finished scap: Backport for ORES: add model versions configuration and thresholds (T319170) (duration: 10m 12s)
- 08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:19 daniel@deploy1002: daniel and isaranto: Backport for ORES: add model versions configuration and thresholds (T319170) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 08:18 daniel@deploy1002: Started scap: Backport for ORES: add model versions configuration and thresholds (T319170)
- 07:55 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) (duration: 09m 09s)
- 07:48 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 07:46 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366)
- 07:42 mlitn@deploy1002: Finished scap: Backport for Add $wgInterwikiLogoOverride (T315269) (duration: 33m 02s)
- 07:35 moritzm: installing libssh security updates
- 07:29 mlitn@deploy1002: mlitn: Backport for Add $wgInterwikiLogoOverride (T315269) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
- 07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
- 07:09 mlitn@deploy1002: Started scap: Backport for Add $wgInterwikiLogoOverride (T315269)
- 06:16 kart_: Updated MinT to 2023-06-01-041041-production (T336525)
- 06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied
- 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 05:39 kart_: Updated cxserver to 2023-06-01-041016-production (T337669)
- 05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 00:11 eileen: civicrm upgraded from 885208ca to 3819d6d1