You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2003-dev.wikimedia.org with OS bullseye) |
imported>Stashbot (apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps') |
||
(446 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== | == 2023-06-02 == | ||
* | * 20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps' | ||
* | * 18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc | ||
* | * 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002" | ||
* | * 17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002" | ||
* | * 17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox | ||
* | * 17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | ||
* | * 17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups | ||
* 16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed | |||
* 15:35 sukhe: restart ntp.service on dns1002 | |||
* 13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479 | |||
* 13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release | |||
* 09:35 moritzm: installing texlive-security updates on buster | |||
* 09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) [[phab:T337836|T337836]]. Reload systemd for unit changes to take effect | |||
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet | |||
* 08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet | |||
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet | |||
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts: | |||
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts: | |||
* 08:42 moritzm: installing traceroute bugfix updates from Bullseye point release | |||
* 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org | |||
* 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org | |||
* 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org | |||
* 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org | |||
* 07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org | |||
* 07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org | |||
* 01:53 ejegg: fundraising python tools upgraded from {{Gerrit|759d4c89}} to {{Gerrit|2ca83336}} | |||
* 01:22 cstone: civicrm upgraded from {{Gerrit|3819d6d1}} to {{Gerrit|bcc8fccc}} | |||
== | == 2023-06-01 == | ||
* 21: | * 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] (duration: 08m 30s) | ||
* | * 20:59 samtar@deploy1002: esanders and samtar: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* 20: | * 20:57 samtar@deploy1002: Started scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] | ||
* 20: | * 20:54 samtar@deploy1002: Finished scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] (duration: 10m 29s) | ||
* 20: | * 20:45 samtar@deploy1002: samtar and ksarabia: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 20:44 samtar@deploy1002: Started scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] | ||
* | * 20:21 samtar@deploy1002: Finished scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] (duration: 07m 56s) | ||
* | * 20:15 samtar@deploy1002: dani and samtar: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | ||
* 20:13 samtar@deploy1002: Started scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] | |||
* | * 20:12 samtar@deploy1002: Finished scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] (duration: 08m 20s) | ||
* | * 20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] | ||
* | * 19:51 ejegg: fundraising python tools upgraded from {{Gerrit|72570bdd}} to {{Gerrit|759d4c89}} | ||
* 19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s) | |||
* | * 19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided) | ||
* | * 19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s) | ||
* | * 19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* 19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work | |||
* | * 18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* | * 18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* 18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11 refs [[phab:T337525|T337525]] | |||
* 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s) | |||
* | * 17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp | ||
* 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) | |||
* | * 17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply | ||
* 17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply | |||
* | * 17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp | ||
* | * 17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply | ||
* 17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply | |||
* | * 17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | ||
* | * 17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply | ||
* | * 17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | ||
* | * 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | ||
* | * 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | ||
* 16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting. Recent EventBus changes are not deployed yet? - [[phab:T336817|T336817]] (duration: 07m 24s) | |||
* | * 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | ||
* | * 16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002" | ||
* | * 16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002" | ||
* | * 16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - [[phab:T336817|T336817]] (duration: 08m 18s) | ||
* | * 16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye | |||
* | * 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye | ||
* 10 | * 16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* | * 16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] (round 2!) | ||
* | * 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" | ||
* 16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" | |||
* | * 16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage | ||
* | * 16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage | ||
* | * 16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage | ||
* | * 16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - [[phab:T337041|T337041]] | ||
* | * 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage | ||
* | * 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook | ||
* | * 15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] | ||
* | * 15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes | ||
* | * 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye | ||
* | * 14:59 moritzm: installing python-sqlparse security updates | ||
* | * 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox | ||
* | * 14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* | * 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | ||
* | * 14:53 moritzm: installing jackson-databind security updates | ||
* | * 14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox | ||
* | * 14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook | ||
* | * 14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage | ||
* | * 14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage | ||
* | * 14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp | ||
* | * 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary | ||
* | * 14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary | ||
* 14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp | |||
* | * 14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | ||
* 14:29 moritzm: installing imagemagick security updates on buster | |||
* | * 14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye | ||
* | * 14:14 fabfur: Disabled puppet on A:cp-drmrs for [[phab:T323557|T323557]] | ||
* | * 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s) | ||
* | * 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) | ||
* | * 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json | ||
* | * 14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - [[phab:T337505|T337505]] | ||
* | * 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json | ||
* 13:52 moritzm: installing sysstat security updates | |||
* 13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json | |||
* 13:29 moritzm: installing openssl security updates on bullseye | |||
* 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json | |||
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json | |||
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance | |||
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance | |||
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json | |||
* 13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - [[phab:T337505|T337505]] | |||
* 13:13 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] (duration: 11m 08s) | |||
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json | |||
* 13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 13:02 urbanecm@deploy1002: Started scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] | |||
* 12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json | |||
* 12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json | |||
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json | |||
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance | |||
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance | |||
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json | |||
* 12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json | |||
* 12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # [[phab:T336365|T336365]] | |||
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json | |||
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json | |||
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json | |||
* 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance | |||
* 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance | |||
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json | |||
* 11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json | |||
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json | |||
* 11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707 | |||
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json | |||
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json | |||
* 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance | |||
* 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance | |||
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json | |||
* 10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json | |||
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json | |||
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | |||
* 10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | |||
* 10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json | |||
* 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json | |||
* 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance | |||
* 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance | |||
* 09:56 moritzm: installing systemd security updates on bullseye | |||
* 09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php ([[phab:T336224|T336224]]) | |||
* 09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G | |||
* 09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev'] | |||
* 09:32 volans: installed spicerack v7.2.0 on cumin2002 | |||
* 09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet | |||
* 09:18 godog: remove lv prometheus-global - [[phab:T288196|T288196]] | |||
* 09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet | |||
* 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet | |||
* 09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet | |||
* 09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet | |||
* 09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet | |||
* 09:12 volans: installed spicerack v7.2.0 on cumin1001 | |||
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet | |||
* 09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet | |||
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet | |||
* 09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet | |||
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet | |||
* 08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet | |||
* 08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye | |||
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001" | |||
* 08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001" | |||
* 08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:48 apergos: UTC morning backport and config training window done | |||
* 08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:28 daniel@deploy1002: Finished scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] (duration: 10m 12s) | |||
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:19 daniel@deploy1002: daniel and isaranto: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 08:18 daniel@deploy1002: Started scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] | |||
* 07:55 daniel@deploy1002: Finished scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] (duration: 09m 09s) | |||
* 07:48 daniel@deploy1002: daniel: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 07:46 daniel@deploy1002: Started scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] | |||
* 07:42 mlitn@deploy1002: Finished scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] (duration: 33m 02s) | |||
* 07:35 moritzm: installing libssh security updates | |||
* 07:29 mlitn@deploy1002: mlitn: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress | |||
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress | |||
* 07:09 mlitn@deploy1002: Started scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] | |||
* 06:16 kart_: Updated MinT to 2023-06-01-041041-production ([[phab:T336525|T336525]]) | |||
* 06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied | |||
* 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 05:39 kart_: Updated cxserver to 2023-06-01-041016-production ([[phab:T337669|T337669]]) | |||
* 05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 00:11 eileen: civicrm upgraded from {{Gerrit|885208ca}} to {{Gerrit|3819d6d1}} | |||
==Archives == | |||