You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Labslogbot (Re-pooling mw1159 and mw1160; ran out of time for debugging. (ori)) |
imported>Stashbot (sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be) |
||
Line 1: | Line 1: | ||
== | == 2023-03-29 == | ||
* | * 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be | ||
* 00: | * 00:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn | ||
* 00:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet | |||
* 00:37 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet | |||
* 00:30 sukhe: restart pybal on lvs1018 to hopefully resolve flapping BGP session | |||
* 00:06 zabe@deploy2002: Finished scap: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]] (duration: 07m 19s) | |||
* 00:00 zabe@deploy2002: zabe: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
== | == 2023-03-28 == | ||
* 22: | * 23:59 zabe@deploy2002: Started scap: Backport for [[gerrit:903803{{!}}throttle: Remove expired throttle]] | ||
* 21: | * 23:46 zabe@deploy2002: Finished scap: [[phab:T331831|T331831]] (duration: 06m 50s) | ||
* 20:51 | * 23:39 zabe@deploy2002: Started scap: [[phab:T331831|T331831]] | ||
* 05: | * 23:34 zabe@deploy2002: Finished scap: [[phab:T331831|T331831]] (duration: 07m 01s) | ||
* | * 23:27 zabe@deploy2002: Started scap: [[phab:T331831|T331831]] | ||
* | * 23:27 zabe: central Kurdish Wiktionary (ckbwiktionary) | ||
* | * 22:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit1003.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 22:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 22:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002" | |||
* 22:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for gerrit1003 - pt1979@cumin2002" | |||
* 22:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:44 eileen: civicrm upgraded from {{Gerrit|db3b727e}} to {{Gerrit|183d131d}} | |||
* 21:23 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments (duration: 00m 13s) | |||
* 21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9b31c6b]: correct mw_sql_to_hive.py cli arguments | |||
* 21:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:06 urandom: updating image_suggestions default table TTL(s) from {{Gerrit|1209600}} to {{Gerrit|1814400}} (seconds) — [[phab:T333319|T333319]] | |||
* 21:05 phedenskog@deploy2002: Finished deploy [performance/navtiming@4d22874]: (no justification provided) (duration: 00m 06s) | |||
* 21:05 phedenskog@deploy2002: Started deploy [performance/navtiming@4d22874]: (no justification provided) | |||
* 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:03 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]] (duration: 28m 53s) | |||
* 20:51 urbanecm@deploy2002: urbanecm and matmarex: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:34 urbanecm@deploy2002: Started scap: Backport for [[gerrit:903684{{!}}Only run edit check on main namespace]], [[gerrit:903685{{!}}Change name of the editcheck-needreference tag to editcheck-references]], [[gerrit:903759{{!}}Enable hidden tag for "Edit Check" project on Wikipedias (T324733)]] | |||
* 20:27 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes (duration: 00m 13s) | |||
* 20:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6febfd]: increase dynamic partition limit when importing cirrus indexes | |||
* 20:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:02 ejegg: payments-wiki upgraded from {{Gerrit|f5ec2677}} to {{Gerrit|b5df483f}} | |||
* 19:29 dduvall@deploy2002: Pruned MediaWiki: 1.40.0-wmf.27 (duration: 02m 11s) | |||
* 19:26 dduvall@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]] (duration: 07m 24s) | |||
* 19:19 dduvall@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]] | |||
* 18:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:37 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance (duration: 00m 20s) | |||
* 18:36 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@0f1c9e8]: Deploy latest image_suggestions on platform_eng Airflow instance | |||
* 18:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1208.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1207.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002" | |||
* 18:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new db nodes - pt1979@cumin2002" | |||
* 18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 17:57 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 17:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=ats-be | |||
* 16:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet,service=cdn | |||
* 16:52 volans: uploaded spicerack_6.4.0 to apt.wikimedia.org bullseye-wikimedia (but I'll deploy it to the cumin hosts tomorrow) | |||
* 16:10 jnuche@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]] (duration: 49m 52s) | |||
* 16:09 bblack: reboot cp1082 (NIC issues) | |||
* 16:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=ats-be | |||
* 16:03 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1082.eqiad.wmnet,service=cdn | |||
* 16:00 inflatador: bking@cumin1001 unban elastic and cloudelastic nodes post maintenance [[phab:T330165|T330165]] | |||
* 15:57 btullis@deploy2002: Finished deploy [analytics/refinery@6554ec0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6554ec0] (duration: 01m 32s) | |||
* 15:20 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]] | |||
* 15:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database | |||
* 15:15 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database | |||
* 15:14 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 15:08 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 15:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 15:05 jnuche@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.41.0-wmf.2" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 03s) | |||
* 15:05 jnuche@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.2 refs [[phab:T330208|T330208]] | |||
* 14:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw | |||
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=eqiad | |||
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=device-analytics,name=pki | |||
* 14:53 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=device-analytics,name=eqiad | |||
* 14:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=device-analytics | |||
* 14:51 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row B switches upgrade done - [[phab:T330165|T330165]] | |||
* 14:48 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet | |||
* 14:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet | |||
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor100[12].eqiad.wmnet | |||
* 14:38 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet | |||
* 14:32 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches upgrade done - [[phab:T330165|T330165]] | |||
* 14:31 sukhe: run authdns-update to revert eqiad depool | |||
* 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web | |||
* 14:25 filippo@cumin1001: conftool action : set/pooled=no; selector: name=THANOS-FE-OLD-FQDN,service=thanos-web | |||
* 14:05 XioNoX: reboot eqiad row B for upgrade - [[phab:T330165|T330165]] | |||
* 13:58 godog: depool thanos-fe1002 - [[phab:T330165|T330165]] | |||
* 13:54 Emperor: depool ms-fe1010 before switch work [[phab:T330165|T330165]] | |||
* 13:53 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet | |||
* 13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 249 hosts with reason: eqiad row B upgrade | |||
* 13:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet | |||
* 13:47 akosiaris: depool swift in eqiad for row B upgrade | |||
* 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad | |||
* 13:47 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad | |||
* 13:46 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync | |||
* 13:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 249 hosts with reason: eqiad row B upgrade | |||
* 13:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync | |||
* 13:45 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync | |||
* 13:44 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync | |||
* 13:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad | |||
* 13:41 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad | |||
* 13:36 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 13:34 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor,name=eqiad | |||
* 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1002.eqiad.wmnet | |||
* 13:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet | |||
* 13:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 13:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row B switches upgrade - [[phab:T330165|T330165]] | |||
* 12:59 XioNoX: depool eqiad for network maintenance - [[phab:T330165|T330165]] | |||
* 12:58 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches upgrade - [[phab:T330165|T330165]] | |||
* 12:57 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 12:56 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 12:56 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 | |||
* 12:44 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108 | |||
* 12:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 | |||
* 12:43 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108 | |||
* 12:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 | |||
* 12:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108 | |||
* 12:36 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye | |||
* 12:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112 | |||
* 12:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112 | |||
* 12:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage | |||
* 12:21 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage | |||
* 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 12:20 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 12:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295 | |||
* 12:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 45295 | |||
* 12:09 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye | |||
* 11:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 11:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 11:56 elukey: dist-upgrade kafka-main1002 to debian bullseye - [[phab:T332013|T332013]] | |||
* 11:51 ladsgroup@deploy2002: Finished scap: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]] (duration: 18m 42s) | |||
* 11:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 11:37 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:34 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 11:34 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 11:32 ladsgroup@deploy2002: Started scap: Backport for [[gerrit:903549{{!}}api: Mark query as read-only to avoid regex on SQL (T332942)]] | |||
* 11:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 11:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 11:22 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 11:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 11:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 11:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 10:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage | |||
* 10:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage | |||
* 09:56 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues | |||
* 09:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues | |||
* 09:41 vgutierrez: resetting cp2035 management card - [[phab:T333312|T333312]] | |||
* 09:38 elukey: dist-upgrade kafka-main1001 to bullseye - [[phab:T332013|T332013]] | |||
* 09:36 godog: silence systemdunitfailed alerts for team=wmcs - [[phab:T333315|T333315]] | |||
* 09:35 vgutierrez: depool cp2035 - [[phab:T333312|T333312]] | |||
* 09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 09:12 jbond@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts | |||
* 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts | |||
* 09:11 jbond@cumin1001: END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts | |||
* 09:11 jbond@cumin1001: START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts | |||
* 08:58 vgutierrez: restart ipmiseld on cp2035 | |||
* 08:50 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org | |||
* 08:49 ayounsi@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:48 AndyRussG: update payments.wiki config {{Gerrit|65bedd4a}} -> {{Gerrit|e31ffd7d}}, payments (automatic updates only) {{Gerrit|a6c6c2b1}} -> {{Gerrit|f5ec2677}} | |||
* 08:45 ayounsi@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 08:43 ayounsi@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 08:42 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org | |||
* 08:39 ayounsi@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 08:37 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:35 ayounsi@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:34 ayounsi@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:32 ayounsi@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:32 phedenskog@deploy2002: Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s) | |||
* 08:32 phedenskog@deploy2002: Started deploy [performance/navtiming@e757bdf]: (no justification provided) | |||
* 08:31 ayounsi@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. | |||
* 08:29 ayounsi@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | |||
* 08:25 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:21 ayounsi@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:14 ayounsi@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 08:11 oblivian@deploy2002: Finished scap: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]] (duration: 08m 48s) | |||
* 08:08 ayounsi@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. | |||
* 08:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance | |||
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance | |||
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance | |||
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance | |||
* 08:04 oblivian@deploy2002: oblivian and filippo: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance | |||
* 08:03 ayounsi@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance | |||
* 08:02 oblivian@deploy2002: Started scap: Backport for [[gerrit:903209{{!}}Failover statsd to graphite2004 (T330165)]] | |||
* 08:02 ayounsi@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:00 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 08:00 godog: move graphite reads to codfw - [[phab:T330165|T330165]] | |||
* 07:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 07:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 07:56 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 07:54 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 07:54 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 07:51 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 07:51 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45965 and previous config saved to /var/cache/conftool/dbconfig/20230328-073122-root.json | |||
* 07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17806 | |||
* 07:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 17806 | |||
* 07:20 kartik@deploy2002: Finished scap: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]] (duration: 12m 05s) | |||
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45964 and previous config saved to /var/cache/conftool/dbconfig/20230328-071617-root.json | |||
* 07:10 kartik@deploy2002: kartik: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:08 kartik@deploy2002: Started scap: Backport for [[gerrit:903003{{!}}Enable Section Translation on some wikis while Content Translation remains in beta (T308834)]] | |||
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45963 and previous config saved to /var/cache/conftool/dbconfig/20230328-070112-root.json | |||
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45962 and previous config saved to /var/cache/conftool/dbconfig/20230328-064607-root.json | |||
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45961 and previous config saved to /var/cache/conftool/dbconfig/20230328-063103-root.json | |||
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45960 and previous config saved to /var/cache/conftool/dbconfig/20230328-061558-root.json | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 [[phab:T329481|T329481]]', diff saved to https://phabricator.wikimedia.org/P45959 and previous config saved to /var/cache/conftool/dbconfig/20230328-061441-root.json | |||
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P45958 and previous config saved to /var/cache/conftool/dbconfig/20230328-060053-root.json | |||
* 05:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 05:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 05:53 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 05:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 05:47 AndyRussG: update payments-wiki {{Gerrit|f5e262d1}} -> {{Gerrit|a6c6c2b1}} | |||
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P45957 and previous config saved to /var/cache/conftool/dbconfig/20230328-054548-root.json | |||
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P45956 and previous config saved to /var/cache/conftool/dbconfig/20230328-053043-root.json | |||
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45955 and previous config saved to /var/cache/conftool/dbconfig/20230328-051539-root.json | |||
* 01:59 krinkle@deploy2002: Synchronized wmf-config/mc.php: {{Gerrit|I44edcd46da45b827d}} (duration: 06m 33s) | |||
== | == 2023-03-27 == | ||
* | * 23:47 mutante: people1003 - taking down apache to provoke monitoring alert (inactive instances) and confirm IRC alerting change works | ||
* 16: | * 23:31 zabe: deployed patch for [[phab:T330968|T330968]] | ||
* 16: | * 23:08 zabe@deploy2002: Finished scap: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]] (duration: 21m 27s) | ||
* | * 23:00 zabe@deploy2002: zabe: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 22:48 mutante: stat1005 - kill 18179; run puppet ; stat1007 - kill 3346; run puppet ; stat1006 - kill 23887 run puppet | ||
* | * 22:47 zabe@deploy2002: Started scap: Backport for [[gerrit:903205{{!}}Rename "Support and Safety" to "Trust and Safety" (T330514)]] | ||
* 13: | * 22:43 mutante: stat1004 - kill 29291; run puppet | ||
* | * 22:43 mutante: apt2001 - kill 3105; run puppet | ||
* | * 22:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Meta:WMF Support and Safety" "Meta:WMF Trust and Safety" "Zabe" --reason "per [[:phab:T330514{{!}}T330514]]" # [[phab:T330514|T330514]] | ||
* | * 21:58 maryum: Deploy security fix for [[phab:T326952|T326952]] | ||
* | * 21:58 urandom: power cycling restbase1033 — [[phab:T333243|T333243]] | ||
* | * 21:45 ryankemper: [[phab:T330165|T330165]] Depooled relevant search platform hosts: `sudo -E cumin 'elastic[1055-1056,1074-1079,1085-1086]*,cloudelastic100[2,6]*,wcqs1002*,wdqs[1007,1012]*' 'sudo depool'` | ||
* 21:24 Amir1: start of watchlist clean up in arwiki ([[phab:T328501|T328501]]) | |||
* 21:23 kindrobot: finish UTC late backports | |||
* 21:22 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]] (duration: 08m 26s) | |||
* 21:15 kindrobot@deploy2002: kindrobot and superpes: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 21:15 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided) (duration: 00m 13s) | |||
* 21:14 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5f0eb44]: (no justification provided) | |||
* 21:14 kindrobot@deploy2002: Started scap: Backport for [[gerrit:903326{{!}}Disable VisualEditor from talk namespace]], [[gerrit:903323{{!}}[sysop_itwiki] Add the logo also for vector 2022 (T330279)]] | |||
* 21:11 tzatziki: moving Universal Code of Conduct/Enforcement guidelines -> Universal Code of Conduct/Enforcement guidelines/Version 1 on metawiki with `extensions/Translate/scripts/moveTranslatableBundle.php ` | |||
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1022.eqiad.wmnet | |||
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1022.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:41 andrew@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1022.eqiad.wmnet | |||
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1021.eqiad.wmnet | |||
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:33 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:31 andrew@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1021.eqiad.wmnet | |||
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1017.eqiad.wmnet | |||
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:23 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:21 andrew@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:20 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]] (duration: 10m 50s) | |||
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1017.eqiad.wmnet | |||
* 20:11 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:10 kindrobot@deploy2002: Started scap: Backport for [[gerrit:903322{{!}}Expand list of wikis with language button at top. (T331777)]], [[gerrit:902197{{!}}Enable web based viewing of ReadingLists on mediawiki.org and metawiki (T322093)]] | |||
* 20:01 kindrobot: start UTC late backport window | |||
* 19:21 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2 (duration: 00m 14s) | |||
* 19:21 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3259099]: bump glent jar to 0.3.2 | |||
* 19:06 jgleeson: civicrm upgraded from {{Gerrit|09373b9d}} to {{Gerrit|db3b727e}} | |||
* 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:40 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:39 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 16:39 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 16:34 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 16:34 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 16:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:33 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 16:25 jgleeson: payments-wiki upgraded from {{Gerrit|36366f64}} to {{Gerrit|f5e262d1}} | |||
* 15:55 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided) (duration: 00m 11s) | |||
* 15:54 ebysans@deploy2002: Started deploy [airflow-dags/analytics@e7f9c7f]: (no justification provided) | |||
* 15:20 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 15:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 15:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 15:17 elukey@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 10s) | |||
* 15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict1002.eqiad.wmnet | |||
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict1002.eqiad.wmnet on all recursors | |||
* 14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict1002.eqiad.wmnet on all recursors | |||
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001" | |||
* 14:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict1002.eqiad.wmnet - eoghan@cumin1001" | |||
* 14:52 eoghan@cumin1001: START - Cookbook sre.dns.netbox | |||
* 14:52 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host aphlict1002.eqiad.wmnet | |||
* 14:48 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 14:48 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 14:47 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 14:47 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 14:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:44 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:43 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:40 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 14:29 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 14:29 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 14:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 14:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 14:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 14:28 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 14:27 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:16 taavi: taavi@mwmaint2002 ~ $ mwscript namespaceDupes.php --wiki=huwiki --fix # [[phab:T333083|T333083]] | |||
* 14:15 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:15 taavi@deploy2002: Finished scap: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]] (duration: 08m 27s) | |||
* 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:14 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:14 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:08 taavi@deploy2002: taavi: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 14:06 taavi@deploy2002: Started scap: Backport for [[gerrit:903194{{!}}namespaceDupes: Remove extra addQuotes() calls (T333166)]] | |||
* 13:35 fab@deploy2002: Finished deploy [airflow-dags/research@d2c115d]: (no justification provided) (duration: 00m 21s) | |||
* 13:35 fab@deploy2002: Started deploy [airflow-dags/research@d2c115d]: (no justification provided) | |||
* 13:12 taavi@deploy2002: Finished scap: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]] (duration: 08m 45s) | |||
* 13:04 taavi@deploy2002: superpes and taavi: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:03 taavi@deploy2002: Started scap: Backport for [[gerrit:902888{{!}}[huwiki] Add Draft and Draft_talk namespaces (T333083)]] | |||
* 12:42 godog: flip alert* to overlay2 - [[phab:T329939|T329939]] | |||
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons. | |||
* 10:31 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 10:30 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 10:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 10:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 10:10 elukey: dist-upgrade kafka-main1003 manually to bullseye - [[phab:T332013|T332013]] | |||
* 10:03 Emperor: depool ms-fe2009 | |||
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 09:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1003.eqiad.wmnet with reason: stop kafka and dist-upgrade | |||
* 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45295 | |||
* 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45295 | |||
* 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:39 cgoubert@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001" | |||
* 08:57 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for mw-api-int - cgoubert@cumin1001" | |||
* 08:55 cgoubert@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:47 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons. | |||
* 08:39 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]] (duration: 18m 15s) | |||
* 08:30 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 08:28 jynus: restarting bacula at backup1001 [[phab:T331510|T331510]] | |||
* 08:25 urbanecm@deploy2002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|63dd23b5ceaba35c8d9682493dd21d99a20fc8f7}}: [Growth] eswiki: Enable mentorship for 50% of newcomers ([[phab:T332737|T332737]], [[phab:T285235|T285235]]) (duration: 06m 09s) | |||
* 08:21 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:903186{{!}}EntityUsageTable: Mark query as read-only (T332941)]] | |||
* 08:18 urbanecm@deploy2002: Backport cancelled. | |||
* 08:06 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]] (duration: 07m 52s) | |||
* 08:03 marostegui: Failover m1 from db1164 to db1101 - [[phab:T331510|T331510]] | |||
* 08:00 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:58 urbanecm@deploy2002: Started scap: Backport for [[gerrit:902734{{!}}GrowthMentors.json: Add a write-only username field (T331444)]] | |||
* 07:55 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]] (duration: 16m 45s) | |||
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45949 and previous config saved to /var/cache/conftool/dbconfig/20230327-075206-root.json | |||
* 07:48 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 07:39 jynus: disabling puppet and shutding down bacula at backup1001 [[phab:T331510|T331510]] | |||
* 07:38 urbanecm@deploy2002: Started scap: Backport for [[gerrit:902741{{!}}SpecialWikiSets: Avoid calling WikiSet::getId on null (T333075)]] | |||
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45948 and previous config saved to /var/cache/conftool/dbconfig/20230327-073701-root.json | |||
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45947 and previous config saved to /var/cache/conftool/dbconfig/20230327-072156-root.json | |||
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45946 and previous config saved to /var/cache/conftool/dbconfig/20230327-070651-root.json | |||
* 06:51 marostegui: dbmaint s3 eqiad Rename flaggedrevs tables on db1123 ptwikisource [[phab:T332594|T332594]] | |||
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45945 and previous config saved to /var/cache/conftool/dbconfig/20230327-065147-root.json | |||
* 06:40 marostegui: Rename flaggedrevs tables on db1123 ptwikisource [[phab:T332594|T332594]] | |||
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45944 and previous config saved to /var/cache/conftool/dbconfig/20230327-063642-root.json | |||
* 05:40 kart_: Updated cxserver to 2023-03-17-133444-production ([[phab:T332379|T332379]] + build changes) | |||
* 05:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 05:37 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 05:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 05:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 05:24 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 05:23 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T332292|T332292]]', diff saved to https://phabricator.wikimedia.org/P45942 and previous config saved to /var/cache/conftool/dbconfig/20230327-051941-root.json | |||
* 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch [[phab:T331510|T331510]] | |||
* 05:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch [[phab:T331510|T331510]] | |||
== | == 2023-03-25 == | ||
* | * 07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s) | ||
* 07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 | |||
* 00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host | |||
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host | |||
* 00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host | |||
* | * 00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer | ||
* 00:09 tzatziki: removing 2 files for legal compliance | |||
* | |||
* | |||
* | |||
* | |||
* 00: | |||
== | == 2023-03-24 == | ||
* 23: | * 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]" | ||
* 23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]" | |||
* 23: | * 23:50 tzatziki: removing 1 file for legal compliance | ||
* 23: | * 21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after {{Gerrit|I4fe17f397856361}} which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. [[phab:T328907|T328907]] | ||
* | * 18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s) | ||
* | * 18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable | ||
* | * 18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s) | ||
* | * 18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags | ||
* 18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s) | |||
* 18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag | |||
* | * 17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s) | ||
* 17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag | |||
* 18: | * 15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | ||
* 18: | * 15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | ||
* 17: | * 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | ||
* 17: | * 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | ||
* | * 15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | ||
* | * 15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | ||
* | * 15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . | ||
* | * 14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | ||
* 14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request [[:phab:T332917{{!}}T332917]]" # [[phab:T332917|T332917]] | |||
* 15: | * 11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet | ||
* 15:35 | * 11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet | ||
* 15: | * 11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | ||
* | * 11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync | ||
* | * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update | ||
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update | |||
* | * 10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | ||
* | * 10:00 marostegui: Upgrade db1204 to mariadb 10.6 [[phab:T330861|T330861]] | ||
* | * 08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key [[phab:T332972|T332972]] | ||
* 10: | * 05:37 hashar: gerrit: refreshed ssh host key for `github.com` | ||
* 05:28 hashar: Restarted Gerrit | |||
* 05:26 hashar: Stopping Gerrit | |||
* | * 05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 10s) | ||
* 05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) | |||
* | * 05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org | ||
* | * 05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 07s) | ||
* | * 05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) | ||
* | * 05:17 hashar: Restarting Gerrit for deploying plugins updates | ||
* | * 05:10 ejegg: Standalone SmashPig upgraded from {{Gerrit|3b84e4cb}} to {{Gerrit|50139e82}} | ||
* | * 05:04 ejegg: payments-wiki upgraded from {{Gerrit|4d0c90b4}} to {{Gerrit|4b0a71fa}} | ||
* | * 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | * 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* 00: | |||
* 00: | |||
* 00: | |||
== | == 2023-03-23 == | ||
* | * 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - [[phab:T332952|T332952]] | ||
* 22: | * 22:20 mutante: moscovium performing apt-get full-upgrade [[phab:T332952|T332952]] | ||
* 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - [[phab:T327068|T327068]] | |||
* 22: | * 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) [[phab:T327068|T327068]] | ||
* | * 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye | ||
* | * 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) [[phab:T327068|T327068]] | ||
* | * 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade | ||
* | * 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade | ||
* 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage | |||
* | * 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage | ||
* | * 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | ||
* | * 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]" | ||
* 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]" | |||
* | * 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye | ||
* 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | |||
* 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye | |||
* | * 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | ||
* | * 20:33 taavi@deploy2002: Finished scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] (duration: 10m 56s) | ||
* | * 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet | ||
* | * 20:24 taavi@deploy2002: abi and taavi: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* 20:23 taavi@deploy2002: Started scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] | |||
* | * 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors | ||
* | * 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors | ||
* | * 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001" | ||
* | * 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001" | ||
* 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* | * 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet | ||
* | * 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002 | ||
* | * 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | ||
* | * 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | ||
* 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* | * 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002 | ||
* | * 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1 refs [[phab:T330207|T330207]] | ||
* | * 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply | ||
* | * 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply | ||
* | * 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply | ||
* | * 17:38 mutante: moscovium - systemctl stop rsync | ||
* | * 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply | ||
* 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply | |||
* 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply | |||
* 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer | |||
* 16:59 sukhe: rolling out CR 901333 to A:cp-text [[phab:T313578|T313578]] | |||
* 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333 | |||
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye | |||
* 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye | |||
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage | |||
* 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage | |||
* 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | |||
* 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye | |||
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage | |||
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage | |||
* 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - [[phab:T332796|T332796]] | |||
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye | |||
* 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet | |||
* 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye | |||
* 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet | |||
* 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage | |||
* 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org | |||
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage | |||
* 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye | |||
* 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors | |||
* 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors | |||
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002" | |||
* 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye | |||
* 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet | |||
* 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002" | |||
* 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply | |||
* 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply | |||
* 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply | |||
* 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org | |||
* 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply | |||
* 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s) | |||
* 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply | |||
* 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet | |||
* 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply | |||
* 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage | |||
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] | |||
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s) | |||
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] | |||
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s) | |||
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage | |||
* 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] | |||
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet | |||
* 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye | |||
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet | |||
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s) | |||
* 13:46 TheresNoTime: close UTC afternoon backport window | |||
* 13:45 samtar@deploy2002: Finished scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] (duration: 07m 46s) | |||
* 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] | |||
* 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s) | |||
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] | |||
* 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s) | |||
* 13:39 samtar@deploy2002: samtar: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:37 samtar@deploy2002: Started scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] | |||
* 13:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s) | |||
* 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] | |||
* 13:29 samtar@deploy2002: samtar and sgimeno: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] | |||
* 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` [[phab:T332470|T332470]] | |||
* 13:25 samtar@deploy2002: Finished scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] (duration: 08m 39s) | |||
* 13:18 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:16 samtar@deploy2002: Started scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] | |||
* 13:15 samtar@deploy2002: Finished scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] (duration: 11m 47s) | |||
* 13:08 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:03 samtar@deploy2002: Started scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] | |||
* 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye | |||
* 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage | |||
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye | |||
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage | |||
* 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - [[phab:T332796|T332796]] | |||
* 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye | |||
* 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage | |||
* 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage | |||
* 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye | |||
* 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye | |||
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage | |||
* 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage | |||
* 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage | |||
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage | |||
* 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye | |||
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org | |||
* 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye | |||
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors | |||
* 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors | |||
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002" | |||
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage | |||
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage | |||
* 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002" | |||
* 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org | |||
* 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye | |||
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage | |||
* 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage | |||
* 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]] | |||
* 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - [[phab:T332803|T332803]] | |||
* 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - [[phab:T332796|T332796]] | |||
* 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster | |||
* 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - [[phab:T332796|T332796]] | |||
* 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - [[phab:T332803|T332803]] | |||
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay | |||
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay | |||
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay | |||
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay | |||
* 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - [[phab:T332803|T332803]] | |||
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay | |||
* 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay | |||
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json | |||
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json | |||
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json | |||
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json | |||
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json | |||
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json | |||
* 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye | |||
* 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | |||
* 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye | |||
* 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for [[phab:T331896|T331896]] - no automatic sync / deploy for these | |||
* 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]" | |||
* 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]" | |||
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | |||
* 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye | |||
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye | |||
* 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet | |||
* 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye | |||
== | == 2023-03-22 == | ||
* 23: | * 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage | ||
* 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage | |||
* 23: | * 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye | ||
* 23: | * 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors | ||
* 23: | * 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors | ||
* 23: | * 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001" | ||
* | * 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001" | ||
* | * 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # [[phab:T332782|T332782]] | ||
* | * 23:31 zabe@deploy2002: Finished scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] (duration: 10m 03s) | ||
* | * 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org | ||
* | * 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet | ||
* | * 23:22 zabe@deploy2002: zabe: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 23:21 zabe@deploy2002: Started scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] | ||
* 21:15 taavi: UTC late backports complete | |||
* | * 21:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] (duration: 07m 29s) | ||
* 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet | |||
* | * 21:08 taavi@deploy2002: taavi: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 21:06 taavi@deploy2002: Started scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] | ||
* 21:05 taavi@deploy2002: Finished scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] (duration: 07m 17s) | |||
* | * 20:59 taavi@deploy2002: taavi: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 20:58 taavi@deploy2002: Started scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] | ||
* 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:900748{{!}}Enable page tools for anonymous users (T331052)]] (duration: 10m 10s) | |||
* | * 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ [[phab:T332803|T332803]] | ||
* | * 20:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] (duration: 11m 47s) | ||
* | * 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ [[phab:T332803|T332803]] | ||
* | * 20:32 akosiaris: reboot kubernetes1023 for a test once more | ||
* | * 20:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 20:25 akosiaris: reboot kubernetes1023 for a test | ||
* | * 20:24 samtar@deploy2002: Started scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] | ||
* | * 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] (duration: 09m 57s) | ||
* | * 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors | ||
* | * 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors | ||
* | * 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | ||
* 20:15 samtar@deploy2002: kharlan and samtar: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* | * 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] | ||
* | * 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors | ||
* 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors | |||
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001" | |||
* 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001" | |||
* 20:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] (duration: 07m 22s) | |||
* 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet | |||
* 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org | |||
* 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org | |||
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors | |||
* 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors | |||
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors | |||
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors | |||
* 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | |||
* 20:04 samtar@deploy2002: samtar and matmarex: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s) | |||
* 20:02 samtar@deploy2002: Started scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] | |||
* 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 | |||
* 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org | |||
* 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1 refs [[phab:T330207|T330207]] | |||
* 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ [[phab:T331896|T331896]] - but also see [[phab:T332101|T332101]] | |||
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org | |||
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001" | |||
* 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page | |||
* 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # [[phab:T332121|T332121]] (duration: 06m 44s) | |||
* 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001" | |||
* 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org | |||
* 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s) | |||
* 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) | |||
* 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply | |||
* 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply | |||
* 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - [[phab:T332796|T332796]] | |||
* 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply | |||
* 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply | |||
* 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply | |||
* 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply | |||
* 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye | |||
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet | |||
* 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]] | |||
* 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet | |||
* 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet | |||
* 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage | |||
* 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage | |||
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet | |||
* 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye | |||
* 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available) | |||
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply | |||
* 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet | |||
* 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply | |||
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 15:22 hnowlan: removing java packages from maps hosts | |||
* 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply | |||
* 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply | |||
* 15:13 hnowlan: removing cassandra packages from maps hosts | |||
* 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | |||
* 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage | |||
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json | |||
* 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage | |||
* 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change | |||
* 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change | |||
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json | |||
* 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json | |||
* 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json | |||
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json | |||
* 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s) | |||
* 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG | |||
* 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json | |||
* 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) [[phab:T323961|T323961]] | |||
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet | |||
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted [[phab:T323961|T323961]]', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json | |||
* 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet | |||
* 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet | |||
* 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet | |||
* 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye | |||
* 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet | |||
* 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet | |||
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine) | |||
* 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet | |||
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye | |||
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage | |||
* 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage | |||
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye | |||
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet | |||
* 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet | |||
* 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet | |||
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet | |||
* 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet | |||
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye | |||
* 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye | |||
* 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | |||
* 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | |||
* 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES | |||
* 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082 | |||
* 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082 | |||
* 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s) | |||
* 00:50 zabe@deploy2002: Started scap: update interwiki cache | |||
* 00:47 zabe@deploy2002: Finished scap: [[phab:T332115|T332115]] (duration: 06m 56s) | |||
* 00:40 zabe@deploy2002: Started scap: [[phab:T332115|T332115]] | |||
* 00:40 zabe: create Wikipedia Angika (anpwiki) # [[phab:T332115|T332115]] | |||
* 00:38 zabe@deploy2002: Finished scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] (duration: 27m 00s) | |||
* 00:29 zabe@deploy2002: zabe: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 00:11 zabe@deploy2002: Started scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] | |||
== | == 2023-03-21 == | ||
* 23: | * 23:46 zabe@deploy2002: Finished scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] (duration: 30m 08s) | ||
* | * 23:35 zabe@deploy2002: zabe: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* 20: | * 23:15 zabe@deploy2002: Started scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] | ||
* 19: | * 23:07 zabe@deploy2002: Finished scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] (duration: 07m 10s) | ||
* 18: | * 23:00 zabe@deploy2002: Started scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] | ||
* 17: | * 22:47 ejegg: payments-wiki upgraded from {{Gerrit|0fd66b1f}} to {{Gerrit|ab0a55a2}} | ||
* 16: | * 22:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] (duration: 07m 15s) | ||
* 15: | * 22:04 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 22:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] | ||
* | * 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye | ||
* | * 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | ||
* 02: | * 21:02 AndyRussG: update SmashPig config {{Gerrit|6e651fd4}} -> {{Gerrit|035f602a}} | ||
* | * 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye | ||
* 00:02 | * 20:48 taavi: start [[phab:T315510|T315510]] migration script on group2 s7 wikis | ||
* 20:39 taavi@deploy2002: Finished scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] (duration: 09m 01s) | |||
* 20:31 taavi@deploy2002: matmarex and taavi: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:30 taavi@deploy2002: Started scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] | |||
* 20:20 taavi@deploy2002: Finished scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] (duration: 17m 40s) | |||
* 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:02 taavi@deploy2002: Started scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] | |||
* 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye | |||
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts | |||
* 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts | |||
* 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage | |||
* 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s) | |||
* 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag | |||
* 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage | |||
* 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye | |||
* 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye | |||
* 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1 refs [[phab:T330207|T330207]] | |||
* 18:00 AndyRussG: update SmashPig config {{Gerrit|59a8b2d2}} -> {{Gerrit|6e651fd}} | |||
* 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org | |||
* 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s) | |||
* 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] | |||
* 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet | |||
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent' | |||
* 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster | |||
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors | |||
* 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors | |||
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001" | |||
* 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001" | |||
* 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org | |||
* 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye | |||
* 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors | |||
* 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors | |||
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001" | |||
* 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001" | |||
* 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox | |||
* 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet | |||
* 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5 | |||
* 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org | |||
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors | |||
* 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors | |||
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors | |||
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors | |||
* 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | |||
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox | |||
* 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org | |||
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:42 jbond: stop puppet from deploying this further | |||
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage | |||
* 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage | |||
* 15:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] (duration: 09m 11s) | |||
* 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:19 samtar@deploy2002: samtar: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 15:17 samtar@deploy2002: Started scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] | |||
* 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 15:10 samtar@deploy2002: Finished scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] (duration: 09m 32s) | |||
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad | |||
* 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 15:02 samtar@deploy2002: samtar: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad | |||
* 15:00 samtar@deploy2002: Started scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] | |||
* 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet | |||
* 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet | |||
* 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet | |||
* 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet | |||
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 14:38 hnowlan: disabling puppet on maps* before merging 760619 | |||
* 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts | |||
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts | |||
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s) | |||
* 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 14:02 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] | |||
* 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. | |||
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. | |||
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet | |||
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware | |||
* 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - [[phab:T319372|T319372]] | |||
* 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00 | |||
* 11:08 joal: Kill mediacounts_load oozie job | |||
* 11:07 joal: Unpause mediawiki_history_denormalize airflow job | |||
* 11:06 joal: Kill mediawiki_denormalize oozie job | |||
* 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s) | |||
* 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] | |||
* 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s) | |||
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] | |||
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s) | |||
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] | |||
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s) | |||
* 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] | |||
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye | |||
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage | |||
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage | |||
* 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s) | |||
* 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided) | |||
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC | |||
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC | |||
* 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - [[phab:T319372|T319372]] | |||
* 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150 | |||
* 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150 | |||
* 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s) | |||
* 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1 refs [[phab:T330207|T330207]] (duration: 52m 38s) | |||
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1 refs [[phab:T330207|T330207]] | |||
== | == 2023-03-20 == | ||
* | * 22:00 samtar@deploy2002: Finished scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] (duration: 09m 45s) | ||
* | * 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 21:50 samtar@deploy2002: Started scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] | ||
* 18: | * 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` [[phab:T332614|T332614]] | ||
* 18:18 | * 21:25 TheresNoTime: closing UTC late backport window, extended | ||
* 17: | * 21:22 samtar@deploy2002: Finished scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] (duration: 12m 22s) | ||
* 17: | * 21:11 samtar@deploy2002: samtar and aleksandar: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 21:10 samtar@deploy2002: Started scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] | ||
* 17:14 | * 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s) | ||
* | * 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer | ||
* | * 21:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] (duration: 08m 34s) | ||
* | * 21:02 samtar@deploy2002: matmarex and samtar: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 21:00 TheresNoTime: extending UTC late backport window | ||
* | * 21:00 samtar@deploy2002: Started scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] | ||
* 20:58 kharlan@deploy2002: Finished scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] (duration: 10m 28s) | |||
* 20:49 kharlan@deploy2002: kharlan: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn | |||
* 20:47 kharlan@deploy2002: Started scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] | |||
* 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) [[phab:T257317|T257317]] [[phab:T332623|T332623]] [[phab:T331896|T331896]] | |||
* 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s) | |||
* 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator | |||
* 18:56 ejegg: switched back to new PayPal pending transaction resolver | |||
* 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s) | |||
* 18:47 akosiaris: emergency rollover of redis password complete | |||
* 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry* | |||
* 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s) | |||
* 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script | |||
* 18:42 ejegg: civicrm upgraded from {{Gerrit|3d3606f1}} to {{Gerrit|09373b9d}} | |||
* 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync | |||
* 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | |||
* 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync | |||
* 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync | |||
* 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost ([[phab:T331896|T331896]]) | |||
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet | |||
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet | |||
* 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP ([[phab:T196968|T196968]]) | |||
* 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance | |||
* 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance | |||
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update | |||
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update | |||
* 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry* | |||
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update | |||
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update | |||
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update | |||
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update | |||
* 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye | |||
* 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye | |||
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552 | |||
* 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552 | |||
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json | |||
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]] | |||
* 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]] | |||
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:11 TheresNoTime: close UTC afternoon backport window | |||
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates | |||
* 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates | |||
* 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` [[phab:T331762|T331762]] | |||
* 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` [[phab:T331762|T331762]] | |||
* 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` [[phab:T332351|T332351]] | |||
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` [[phab:T331762|T331762]] | |||
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") [[phab:T331762|T331762]] | |||
* 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` [[phab:T331762|T331762]] | |||
* 13:58 samtar@deploy2002: Finished scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] (duration: 09m 44s) | |||
* 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:49 samtar@deploy2002: Started scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] | |||
* 13:47 samtar@deploy2002: Finished scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] (duration: 09m 26s) | |||
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye | |||
* 13:39 samtar@deploy2002: aleksandar and samtar: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:38 samtar@deploy2002: Started scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] | |||
* 13:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] (duration: 08m 46s) | |||
* 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates | |||
* 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates | |||
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates | |||
* 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates | |||
* 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 30s) | |||
* 13:29 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage | |||
* 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} | |||
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] | |||
* 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 39s) | |||
* 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage | |||
* 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} | |||
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] (duration: 11m 36s) | |||
* 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye | |||
* 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 13:08 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:06 samtar@deploy2002: Started scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] | |||
* 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s) | |||
* 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692 | |||
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692 | |||
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956 | |||
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956 | |||
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082 | |||
* 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082 | |||
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655 | |||
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655 | |||
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552 | |||
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552 | |||
* 09:21 claime: Repooling parse2004 - [[phab:T332119|T332119]] | |||
* 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915 | |||
* 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915 | |||
* 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915 | |||
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915 | |||
== | == 2023-03-19 == | ||
* | * 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) {{Gerrit|27a5b481}} -> {{Gerrit|6359222d}} | ||
* | * 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done | ||
* 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) {{Gerrit|5dd37c9c}} -> {{Gerrit|3d3606f1}} | |||
* 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host | |||
* 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s) | |||
* 15: | * 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | ||
* | |||
* | |||
* | |||
== | == 2023-03-18 == | ||
* | * 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s) | ||
* 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | |||
* | * 14:26 apergos: rsync of xmldata public dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap | ||
* | * 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap | ||
* | * 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC | ||
* | * 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC | ||
* 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s) | |||
* 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | |||
* 01:21 urandom: powercycling restbase2025 — [[phab:T332462|T332462]] | |||
* 00:06 AndyRussG: Updating civicrm from {{Gerrit|5dd37c9c}} to {{Gerrit|3d3606f1}} | |||
* 07: | |||
* 02: | |||
* 02: | |||
* | |||
* | |||
== | == 2023-03-17 == | ||
* | * 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s) | ||
* | * 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching | ||
* | * 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space | ||
* | * 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s) | ||
* | * 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch | ||
* 18: | * 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates | ||
* 18: | * 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates | ||
* 18: | * 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates | ||
* | * 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates | ||
* | * 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates | ||
* | * 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates | ||
* | * 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s) | ||
* | * 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | ||
* 15: | * 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates | ||
* | * 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates | ||
* | * 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates | ||
* | * 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates | ||
* | * 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet | ||
* | * 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet | ||
* | * 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* | * 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* | * 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates | ||
* | * 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates | ||
* | * 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) | ||
* | * 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) | ||
* | * 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) | ||
* | * 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) | ||
* | * 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) | ||
* | * 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) | ||
* | * 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye | ||
* | * 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye | ||
* | * 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) | ||
* | * 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart | ||
* | * 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) | ||
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart | |||
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) | |||
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart | |||
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) | |||
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart | |||
* 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet | |||
* 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - [[phab:T332119|T332119]] | |||
* 12:06 mutante: systemct-reset failed on gitlab-runner* | |||
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. | |||
* 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'. | |||
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json | |||
* 02:10 ejegg: civicrm upgraded from {{Gerrit|672950d9}} to {{Gerrit|5dd37c9c}} | |||
* 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet | |||
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet | |||
* 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates | |||
* 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates | |||
* 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates | |||
* 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates | |||
* 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates | |||
* 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates | |||
== | == 2023-03-16 == | ||
* 23: | * 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates | ||
* 23: | * 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates | ||
* 23: | * 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates | ||
* 23: | * 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates | ||
* 23: | * 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye | ||
* 23: | * 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye | ||
* | * 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s) | ||
* | * 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 | ||
* | * 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage | ||
* 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage | |||
* | * 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage | ||
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage | |||
* | * 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye | ||
* | * 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye | ||
* | * 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet | ||
* | * 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet | ||
* | * 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors | ||
* | * 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors | ||
* | * 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001" | ||
* | * 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001" | ||
* | * 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet | ||
* | * 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors | ||
* | * 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors | ||
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* | * 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002" | ||
* 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002" | |||
* | * 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox | ||
* | * 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet | ||
* | * 22:24 ejegg: civicrm upgraded from {{Gerrit|68fa85cf}} to {{Gerrit|672950d9}} | ||
* | * 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | ||
* | * 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | ||
* | * 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* | * 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] | ||
* 20:36 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): blockers hopefully resolved, rolling to all wikis | |||
* | * 20:35 TheresNoTime: close UTC late backport window | ||
* | * 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] (duration: 08m 18s) | ||
* 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* | * 20:26 samtar@deploy2002: Started scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] | ||
* | * 20:21 brennen@deploy2002: Finished scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s) | ||
* | * 20:14 brennen@deploy2002: brennen and jforrester: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 20:12 brennen@deploy2002: Started scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] | ||
* | * 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s) | ||
* | * 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided) | ||
* | * 18:41 wfan: enable monthlyconvert for cz | ||
* | * 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s) | ||
* | * 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) | ||
* | * 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet | ||
* | * 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye | ||
* | * 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet | ||
* | * 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet | ||
* | * 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* | * 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* | * 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | ||
* | * 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary | ||
* | * 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary | ||
* 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye | |||
* | * 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye | ||
* | * 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye | ||
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates | |||
* | * 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* | * 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s) | ||
* | * 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. | ||
* 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet | |||
* | * 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet | ||
* 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates | |||
* | * 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates | ||
* 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back | |||
* 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet | |||
* 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]] | |||
* 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done] | |||
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler | |||
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner | |||
* 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver | |||
* 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver | |||
* 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]] | |||
* 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler | |||
* 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner | |||
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver | |||
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver | |||
* 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957 | |||
* 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts | |||
* 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts | |||
* 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install | |||
* 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install | |||
* 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 14:06 urandom: ALTER-ing image_suggestions.suggestion table — [[phab:T328670|T328670]] | |||
* 13:35 kostajh: UTC afternoon deploys done | |||
* 13:34 kharlan@deploy2002: Finished scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s) | |||
* 13:28 kharlan@deploy2002: kharlan: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] | |||
* 13:15 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s) | |||
* 13:07 kharlan@deploy2002: kharlan: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:05 kharlan@deploy2002: Started scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] | |||
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad | |||
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad | |||
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install | |||
* 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install | |||
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad | |||
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad | |||
* 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams | |||
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams | |||
* 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams | |||
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin | |||
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams | |||
* 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs | |||
* 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs | |||
* 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install | |||
* 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install | |||
* 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin | |||
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs | |||
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs | |||
* 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet | |||
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw | |||
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw | |||
* 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . | |||
* 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . | |||
* 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . | |||
* 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . | |||
* 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin | |||
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin | |||
* 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . | |||
* 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install | |||
* 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . | |||
* 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install | |||
* 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw | |||
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw | |||
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | |||
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . | |||
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . | |||
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . | |||
* 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . | |||
* 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . | |||
* 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . | |||
* 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . | |||
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json | |||
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet | |||
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet | |||
* 08:40 kostajh: UTC morning deploys (second round) done | |||
* 08:40 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] (duration: 12m 30s) | |||
* 08:29 kharlan@deploy2002: kharlan: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 08:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] | |||
* 08:11 apergos: additional deployments for the UTC morning backport and config training window, running into the next hour, so window re-opened | |||
* 07:36 tgr_: UTC morning deploys done | |||
* 07:34 tgr@deploy2002: Finished scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] (duration: 08m 13s) | |||
* 07:28 tgr@deploy2002: tgr: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:26 tgr@deploy2002: Started scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] | |||
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json | |||
* 06:03 marostegui: Failover m5 from db1106 to db1176 - [[phab:T332155|T332155]] | |||
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]] | |||
* 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]] | |||
* 03:29 ejegg: payments-wiki upgraded from {{Gerrit|1532b107}} to {{Gerrit|0fd66b1f}} | |||
== | == 2023-03-15 == | ||
* | * 22:55 tzatziki: Removing 1 file for legal compliance | ||
* | * 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 55s) | ||
* 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) | |||
* 22: | * 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 28s) | ||
* 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) | |||
* | * 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s) | ||
* 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str | |||
* | * 21:59 brennen: end of phabricator update window ([[phab:T331915|T331915]]) | ||
* | * 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 40s) | ||
* | * 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) | ||
* | * 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 28s) | ||
* | * 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) | ||
* 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) (duration: 00m 52s) | |||
* | * 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) | ||
* | * 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s) | ||
* 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] | |||
* | * 21:13 mutante: phab* - upgrading PHP packages | ||
* 21:13 mutante: phabricator - maintenance window starting - expect possible downtime | |||
* | * 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance | ||
* 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance | |||
* 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) (duration: 00m 31s) | |||
* | * 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) | ||
* | * 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002 | ||
* 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s) | |||
* | * 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor | ||
* | * 20:48 TheresNoTime: close UTC late backport window | ||
* | * 20:48 samtar@deploy2002: Finished scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] (duration: 08m 46s) | ||
* | * 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 20:39 samtar@deploy2002: Started scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] | ||
* | * 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] (duration: 10m 30s) | ||
* | * 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye | ||
* | * 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 20:25 samtar@deploy2002: Started scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] | ||
* | * 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s) | ||
* | * 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye | ||
* | * 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye | ||
* 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage | |||
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye | |||
* 20:15 samtar@deploy2002: sgimeno and samtar: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] | |||
* 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage | |||
* 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s) | |||
* 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries | |||
* 20:11 taavi: deploy patch for [[phab:T331192|T331192]] | |||
* 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage | |||
* 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage | |||
* 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage | |||
* 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage | |||
* 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye | |||
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004'] | |||
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet'] | |||
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013'] | |||
* 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye | |||
* 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage | |||
* 19:49 taavi@deploy2002: Finished scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] (duration: 12m 04s) | |||
* 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye | |||
* 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage | |||
* 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye | |||
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004'] | |||
* 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye | |||
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet'] | |||
* 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye | |||
* 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye | |||
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004'] | |||
* 19:39 taavi@deploy2002: taavi: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet'] | |||
* 19:37 taavi@deploy2002: Started scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] | |||
* 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage | |||
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013'] | |||
* 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013'] | |||
* 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage | |||
* 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye | |||
* 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage | |||
* 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage | |||
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004'] | |||
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet'] | |||
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage | |||
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage | |||
* 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage | |||
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013'] | |||
* 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage | |||
* 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye | |||
* 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye | |||
* 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye | |||
* 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye | |||
* 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye | |||
* 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye | |||
* 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage | |||
* 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. ([[phab:T332115|T332115]]) | |||
* 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage | |||
* 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage | |||
* 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage | |||
* 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye | |||
* 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] (duration: 06m 08s) | |||
* 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet | |||
* 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye | |||
* 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] | |||
* 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s) | |||
* 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries | |||
* 18:06 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group1. | |||
* 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye | |||
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet | |||
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet | |||
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet | |||
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet | |||
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet | |||
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet | |||
* 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage | |||
* 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage | |||
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet | |||
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet | |||
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet | |||
* 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet | |||
* 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye | |||
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet | |||
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet | |||
* 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet | |||
* 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet | |||
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet | |||
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet | |||
* 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet | |||
* 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage | |||
* 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage | |||
* 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye | |||
* 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye | |||
* 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye | |||
* 16:02 hnowlan: restarted thumbor-instances on thumbor1006 | |||
* 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet | |||
* 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet | |||
* 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage | |||
* 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage | |||
* 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye | |||
* 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye | |||
* 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | |||
* 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply | |||
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply | |||
* 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply | |||
* 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply | |||
* 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply | |||
* 14:53 claime: Redeploying mw-on-k8s for php7.4 update [[phab:T330270|T330270]] | |||
* 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:41 cgoubert@deploy2002: Started scap: (no justification provided) | |||
* 14:41 claime: Rebuilding mw-on-k8s images - [[phab:T330270|T330270]] | |||
* 14:38 claime: Updating php7.4 production images | |||
* 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage | |||
* 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage | |||
* 14:24 daniel@deploy2002: Finished scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s) | |||
* 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors | |||
* 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors | |||
* 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki | |||
* 14:22 jbond: switch pki to be active active | |||
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors | |||
* 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors | |||
* 14:19 jbond: update pki to use discovery record | |||
* 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki | |||
* 14:15 daniel@deploy2002: daniel: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye | |||
* 14:14 daniel@deploy2002: Started scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] | |||
* 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: [[phab:T321309|T321309]] | |||
* 14:12 sukhe: depool dns4002 for reimaging to bullseye: [[phab:T321309|T321309]] | |||
* 14:00 moritzm: nodejs security updates on buster | |||
* 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye | |||
* 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: [[phab:T321309|T321309]] | |||
* 13:49 moritzm: installing graphite-web security updates | |||
* 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage | |||
* 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage | |||
* 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | |||
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. | |||
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 13:17 taavi@deploy2002: Finished scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] (duration: 09m 01s) | |||
* 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye | |||
* 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu | |||
* 13:08 taavi@deploy2002: Started scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] | |||
* 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:18 marostegui: Failover m5 from db1176 to db1106 - [[phab:T331877|T331877]] | |||
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]] | |||
* 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]] | |||
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply | |||
* 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply | |||
* 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply | |||
* 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply | |||
* 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply | |||
* 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply | |||
* 11:20 moritzm: imported packages into thirdparty/ceph-quincy | |||
* 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply | |||
* 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply | |||
* 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply | |||
* 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply | |||
* 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply | |||
* 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply | |||
* 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - [[phab:T331268|T331268]]/25 | |||
* 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. | |||
* 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | |||
* 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply | |||
* 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply | |||
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply | |||
* 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply | |||
* 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply | |||
* 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply | |||
* 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply | |||
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply | |||
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply | |||
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply | |||
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply | |||
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply | |||
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply | |||
* 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply | |||
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply | |||
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply | |||
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply | |||
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply | |||
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply | |||
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply | |||
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply | |||
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply | |||
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main | |||
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main | |||
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply | |||
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply | |||
* 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply | |||
* 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply | |||
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply | |||
* 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply | |||
* 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates | |||
* 09:22 moritzm: installing gnutls28 security updates | |||
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T331875|T331875]]', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json | |||
* 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] (duration: 00m 19s) | |||
* 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] | |||
* 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 07:40 tgr_: UTC morning deploys done | |||
* 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet | |||
* 07:36 tgr@deploy2002: Finished scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] (duration: 07m 54s) | |||
* 07:30 tgr@deploy2002: tgr: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 07:28 tgr@deploy2002: Started scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] | |||
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json | |||
* 06:20 marostegui: Remove pki2001 from m1 grants [[phab:T332018|T332018]] | |||
== | == 2023-03-14 == | ||
* 23: | * 23:29 brennen@deploy2002: Finished scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] (duration: 10m 32s) | ||
* 23:20 brennen@deploy2002: brennen and umherirrender: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* | * 23:19 brennen@deploy2002: Started scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] | ||
* 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* | * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 22: | * 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 22: | * 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 22: | * 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 22: | * 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 21: | * 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 21: | * 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 21: | * 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* 21: | * 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 20:43 ejegg: payments-wiki upgraded from {{Gerrit|61c30a4f}} to {{Gerrit|1532b107}} | ||
* | * 20:35 zabe@deploy2002: Finished scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] (duration: 08m 36s) | ||
* 20:28 zabe@deploy2002: zabe: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* | * 20:27 zabe@deploy2002: Started scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] | ||
* 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* | * 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version [[phab:T327919|T327919]] | ||
* 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade | |||
* | * 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade | ||
* | * 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | ||
* | * 19:30 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): uneventful at group0. i'm afk for about an hour. | ||
* | * 19:13 ejegg: civicrm upgraded from {{Gerrit|dbe3b716}} to {{Gerrit|68fa85cf}} | ||
* 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye | |||
* | * 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage | ||
* | * 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s) | ||
* | * 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | ||
* 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage | |||
* | * 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | ||
* | * 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply | ||
* | * 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | ||
* | * 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s) | ||
* | * 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided) | ||
* | * 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply | ||
* | * 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] | ||
* | * 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye | ||
* | * 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply | ||
* | * 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply | ||
* | * 18:03 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group0. | ||
* | * 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | ||
* | * 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. | ||
* | * 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | ||
* | * 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | ||
* | * 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* | * 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* | * 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* | * 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* | * 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye | |||
* 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye | |||
* 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet | |||
* 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes | |||
* 16:47 sukhe: rolling restart of pdns-rec to pick up config changes | |||
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet | |||
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001" | |||
* 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001" | |||
* 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox | |||
* 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph | |||
* 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph | |||
* 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet | |||
* 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye | |||
* 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage | |||
* 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database | |||
* 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database | |||
* 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage | |||
* 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission | |||
* 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission | |||
* 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye | |||
* 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. | |||
* 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply | |||
* 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply | |||
* 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply | |||
* 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply | |||
* 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply | |||
* 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply | |||
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001 | |||
* 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001 | |||
* 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye | |||
* 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage | |||
* 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - [[phab:T331541|T331541]] | |||
* 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage | |||
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json | |||
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye | |||
* 14:00 jbond: reimage pki1001 | |||
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!) | |||
* 13:27 TheresNoTime: close UTC afternoon backport window | |||
* 13:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] (duration: 07m 24s) | |||
* 13:20 samtar@deploy2002: samtar and urbanecm: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:19 samtar@deploy2002: Started scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] | |||
* 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results | |||
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] (duration: 07m 55s) | |||
* 13:11 samtar@deploy2002: esanders and samtar: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:10 samtar@deploy2002: Started scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] | |||
* 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage | |||
* 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage | |||
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage | |||
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage | |||
* 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye | |||
* 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye | |||
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance | |||
* 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance | |||
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json | |||
* 12:23 moritzm: installing git security updates | |||
* 12:20 samtar@deploy2002: Finished scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] (duration: 09m 12s) | |||
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json | |||
* 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of [[phab:T297396|T297396]] + [[phab:T331680|T331680]] — scap rolled back | |||
* 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye | |||
* 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: [[phab:T331541|T331541]] | |||
* 12:13 samtar@deploy2002: samtar and varnent: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 12:11 samtar@deploy2002: Started scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] | |||
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors | |||
* 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors | |||
* 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: [[phab:T331541|T331541]] | |||
* 12:06 claime: Unlocked scap deployments - [[phab:T331541|T331541]] | |||
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json | |||
* 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync | |||
* 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync | |||
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: [[phab:T331541|T331541]] | |||
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors | |||
* 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors | |||
* 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: [[phab:T331541|T331541]] | |||
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json | |||
* 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync | |||
* 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync | |||
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json | |||
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance | |||
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance | |||
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json | |||
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors | |||
* 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors | |||
* 11:13 claime: We are encountering unexpected DNS anycast issued following [[phab:T331541|T331541]], latencies are increased but no production outage. | |||
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json | |||
* 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors | |||
* 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors | |||
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors | |||
* 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors | |||
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage | |||
* 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage | |||
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json | |||
* 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: [[phab:T331541|T331541]] | |||
* 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: [[phab:T331541|T331541]] | |||
* 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]] | |||
* 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye | |||
* 10:42 jbond: reimage pki-root1001 | |||
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json | |||
* 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]] | |||
* 10:32 claime: Repooling all active/active services in eqiad - [[phab:T331541|T331541]] | |||
* 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0) | |||
* 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors | |||
* 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors | |||
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches | |||
* 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99) | |||
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches | |||
* 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - [[phab:T331541|T331541]] | |||
* 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye) | |||
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json | |||
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance | |||
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance | |||
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json | |||
* 10:15 jayme: enabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]] | |||
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json | |||
* 10:02 claime: Locking scap deployment for service switchover - [[phab:T331541|T331541]] | |||
* 10:00 claime: Locking scap deployment for service switchover - [[phab:T330651|T330651]] | |||
* 09:56 jayme: disabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]] | |||
* 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json | |||
* 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 09:36 moritzm: installing NSS security updates | |||
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json | |||
* 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 09:23 Emperor: reboot ms-be2040 [[phab:T331860|T331860]] | |||
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json | |||
* 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance | |||
* 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance | |||
* 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance | |||
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json | |||
* 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045 | |||
* 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o | |||
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json | |||
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json | |||
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json | |||
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json | |||
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance | |||
* 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance | |||
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json | |||
* 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 [[phab:T322294|T322294]] | |||
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json | |||
* 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6 | |||
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json | |||
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json | |||
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001 | |||
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale) | |||
* 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001 | |||
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json | |||
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance | |||
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance | |||
* 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance | |||
* 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance | |||
* 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` | |||
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` | |||
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` | |||
* 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s) | |||
* 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet | |||
* 04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122 | |||
* 04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003` | |||
* 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s) | |||
* 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] (duration: 51m 02s) | |||
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27 refs [[phab:T330205|T330205]] | |||
* 02:22 legoktm: removed user's 2FA on wikitech for [[phab:T331955|T331955]] | |||
* 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json | |||
* 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json | |||
* 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json | |||
* 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json | |||
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json | |||
* 01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance | |||
* 01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance | |||
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json | |||
* 01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json | |||
* 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json | |||
* 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json | |||
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json | |||
* 00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance | |||
* 00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance | |||
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json | |||
* 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json | |||
== | == 2023-03-13 == | ||
* | * 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json | ||
* | * 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json | ||
* 04: | * 23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet | ||
* 02:26 | * 23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet | ||
* | * 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json | ||
* | * 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | ||
* | * 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | ||
* 02: | * 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance | ||
* 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance | |||
* 23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json | |||
* 23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json | |||
* 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json | |||
* 22:48 zabe@deploy2002: Finished scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]] (duration: 06m 56s) | |||
* 22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json | |||
* 22:41 zabe@deploy2002: Started scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]] | |||
* 22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s) | |||
* {{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}} | |||
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json | |||
* 22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance | |||
* 22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance | |||
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json | |||
* 22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s) | |||
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json | |||
* 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json | |||
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json | |||
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json | |||
* 21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance | |||
* 21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance | |||
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json | |||
* 21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye | |||
* 21:21 wfan: remove -d for jobs-dlocal queue runner | |||
* 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json | |||
* 21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet | |||
* 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json | |||
* 21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage | |||
* 21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage | |||
* 21:01 ejegg: enabled jobs-dlocal queue runner | |||
* 21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet | |||
* 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json | |||
* 20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye | |||
* 20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s) | |||
* 20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein | |||
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json | |||
* 20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance | |||
* 20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance | |||
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json | |||
* 20:27 kindrobot: close UTC late backport window | |||
* 20:26 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] (duration: 12m 11s) | |||
* 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json | |||
* 20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:15 kindrobot: start UTC late backport window | |||
* 20:14 kindrobot@deploy2002: Started scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] | |||
* 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json | |||
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet | |||
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet | |||
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json | |||
* 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet | |||
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet | |||
* 19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet | |||
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet | |||
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet | |||
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet | |||
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json | |||
* 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance | |||
* 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance | |||
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json | |||
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet | |||
* 19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet | |||
* 19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet | |||
* 19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet | |||
* 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json | |||
* 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json | |||
* 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet | |||
* 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet | |||
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet | |||
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet | |||
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json | |||
* 18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet | |||
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet | |||
* 18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet | |||
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet | |||