You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Labslogbot (krenair@mira Synchronized php-1.27.0-wmf.12/tests: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 08s) (logmsgbot)) |
imported>Stashbot (Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 (T337446)) |
||
Line 1: | Line 1: | ||
== | == 2023-05-27 == | ||
* | * 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]]) | ||
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday | |||
* | * 00:03 tzatziki: removing 1 file for legal compliance | ||
* 00: | |||
== | == 2023-05-26 == | ||
* 23: | * 23:48 tzatziki: removing 2 files for legal compliance | ||
* | * 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | * 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | ||
* | * 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | * 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* | * 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* | * 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] | ||
* | * 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] (duration: 06m 10s) | ||
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] | |||
* | * 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm | ||
* | * 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm | ||
* | * 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | ||
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002" | |||
* | * 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox | ||
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm | |||
* | * 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | ||
* | * 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm | ||
* | * 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | ||
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* | * 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | ||
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* | * 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99) | ||
* | * 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | ||
* | * 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views | ||
* | * 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.* | ||
* | * 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.* | ||
* | * 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*" | ||
* | * 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*" | ||
* | * 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet | ||
* | * 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001" | ||
* | * 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001" | ||
* | * 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors | ||
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors | |||
* | * 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001" | ||
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001" | |||
* | * 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet | ||
* | * 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002" | ||
* | * 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002" | ||
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox | |||
* | * 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet | ||
* | * 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors | ||
* | * 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors | ||
* | * 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002" | |||
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002" | |||
* | * 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox | ||
* | * 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet | ||
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet | |||
* | * 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | ||
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox | |||
* | * 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet | ||
* | * 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet | ||
* | * 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | ||
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet | |||
* | * 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | ||
* | * 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox | ||
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox | |||
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet | |||
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox | |||
* | * 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet | ||
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* | * 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001" | ||
* | * 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001" | ||
* | * 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye | ||
* | * 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001" | ||
* | * 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001" | ||
* | * 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s) | ||
* | * 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] | ||
* | * 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye | ||
* | * 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s) | ||
* | * 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] | ||
* | * 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary | ||
* | * 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary | ||
* | * 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox | ||
* | * 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox | ||
* | * 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster - [[phab:T329366|T329366]] | ||
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353 | |||
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster | |||
* | * 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]] | ||
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster | |||
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts | |||
* | * 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster | ||
* | * 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts | ||
* | * 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | ||
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* 11: | * 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster | ||
* 11: | * 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage | ||
* | * 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage | ||
* 10: | * 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage | ||
* 10: | * 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage | ||
* | * 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage | ||
* 09: | * 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage | ||
* 09: | * 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage | ||
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage | |||
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster | |||
* | * 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster | ||
* | * 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster | ||
* | * 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster | ||
* | * 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet | ||
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json | |||
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json | |||
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json | |||
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json | |||
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json | |||
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json | |||
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json | |||
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json | |||
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json | |||
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition | |||
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up | |||
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json | |||
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json | |||
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json | |||
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json | |||
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s) | |||
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided) | |||
== | == 2023-05-25 == | ||
* | * 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s) | ||
* 22: | * 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | ||
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] | |||
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s) | |||
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) | |||
* 22: | * 20:47 TheresNoTime: close UTC late backport | ||
* | * 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s) | ||
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* | * 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] | ||
* | * 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s) | ||
* | * 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] | ||
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s) | |||
* | * 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] | ||
* | * 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001" | |||
* 21: | * 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001" | ||
* | * 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox | ||
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json | |||
* 20: | * 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json | ||
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json | |||
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json | |||
* 20: | * 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s) | ||
* | * 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) | ||
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json | |||
* 19: | * 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json | ||
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json | |||
* 19: | * 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json | ||
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json | |||
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json | |||
* | * 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json | ||
* | * 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json | ||
* | * 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json | ||
* | * 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json | ||
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001" | |||
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001" | |||
* | * 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json | ||
* | * 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json | ||
* | * 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply | ||
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply | |||
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply | |||
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply | |||
* | * 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply | ||
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply | |||
* | * 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply | ||
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply | |||
* | * 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply | ||
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply | |||
* | * 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply | ||
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply | |||
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]]) | |||
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json | |||
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json | |||
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* | * 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* | * 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine | |||
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine | |||
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance | |||
* | * 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance | ||
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json | |||
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye | |||
* | * 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1 | ||
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1 | |||
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json | |||
* | * 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s) | ||
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs | |||
* | * 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json | ||
* | * 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance | ||
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance | |||
* | * 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json | ||
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org | |||
* | * 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s) | ||
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided) | |||
* | * 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json | ||
* | * 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad | ||
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad | |||
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage | |||
* | * 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage | ||
* | * 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye | ||
* | * 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s) | ||
* | * 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | ||
* | * 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad | ||
* | * 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad | ||
* | * 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] | ||
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json | |||
* | * 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s) | ||
* | * 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye | ||
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31 | |||
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance | |||
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance | |||
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance | |||
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad | |||
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad | |||
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json | |||
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json | |||
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance | |||
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance | |||
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json | |||
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]] | |||
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . | |||
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json | |||
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye | |||
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026'] | |||
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027'] | |||
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027'] | |||
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026'] | |||
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025'] | |||
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024'] | |||
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json | |||
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025'] | |||
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024'] | |||
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023'] | |||
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023'] | |||
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023'] | |||
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023'] | |||
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022'] | |||
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026'] | |||
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler | |||
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner | |||
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072'] | |||
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver | |||
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad | |||
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad | |||
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox | |||
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad | |||
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json | |||
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] | |||
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard) | |||
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors | |||
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors | |||
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard) | |||
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json | |||
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance | |||
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance | |||
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s) | |||
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] | |||
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s) | |||
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] | |||
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s) | |||
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] | |||
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json | |||
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | |||
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply | |||
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | |||
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply | |||
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply | |||
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply | |||
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply | |||
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply | |||
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply | |||
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply | |||
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply | |||
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply | |||
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply | |||
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json | |||
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply | |||
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply | |||
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply | |||
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply | |||
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply | |||
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply | |||
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply | |||
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply | |||
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply | |||
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply | |||
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply | |||
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync | |||
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync | |||
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json | |||
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync | |||
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync | |||
* 11:15 jbond: update udplog on mwlog server | |||
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json | |||
* 11:09 jbond: upload udplog_1.10_amd64.deb | |||
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json | |||
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply | |||
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json | |||
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply | |||
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync | |||
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync | |||
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json | |||
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply | |||
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply | |||
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply | |||
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply | |||
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply | |||
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org | |||
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002" | |||
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json | |||
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002" | |||
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json | |||
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json | |||
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org | |||
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json | |||
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json | |||
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json | |||
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json | |||
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json | |||
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json | |||
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]]) | |||
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json | |||
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json | |||
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json | |||
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json | |||
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json | |||
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000 to dumpsdata1006, copying all public xml dumps data | |||
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json | |||
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json | |||
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json | |||
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json | |||
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json | |||
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]] | |||
* 07:52 matthiasmullie: UTC morning backports done | |||
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s) | |||
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] | |||
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s) | |||
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json | |||
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json | |||
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json | |||
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]] | |||
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]] | |||
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json | |||
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json | |||
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}} | |||
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}} | |||
== | == 2023-05-24 == | ||
* | * 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s) | ||
* 19: | * 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | ||
* 18: | * 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] | ||
* 18: | * 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s) | ||
* 18: | * 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | ||
* 17:54 | * 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] | ||
* | * 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s) | ||
* | * 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | ||
* | * 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] | ||
* | * 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* 16: | * 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330216|T330216]] (duration: 06m 00s) | ||
* | * 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330216|T330216]] | ||
* | * 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] (duration: 06m 00s) | ||
* | * 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] | ||
* | * 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* 11: | * 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED | ||
* | * 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}} | ||
* | * 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s) | ||
* | * 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance | ||
* | * 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]] | ||
* 00: | * 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s) | ||
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]] | |||
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] | |||
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}} | |||
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s) | |||
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}} | |||
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] | |||
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s) | |||
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] | |||
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s) | |||
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] | |||
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:18 aqu: analytics-refinery, about to deploy | |||
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard) | |||
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors | |||
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors | |||
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard) | |||
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary | |||
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary | |||
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s) | |||
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s) | |||
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] | |||
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] | |||
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s) | |||
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t | |||
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] | |||
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s) | |||
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] | |||
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s) | |||
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] | |||
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s) | |||
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] | |||
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s) | |||
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead. | |||
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] | |||
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]] | |||
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]] | |||
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes | |||
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet | |||
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet | |||
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s) | |||
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset | |||
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]] | |||
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes | |||
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts | |||
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s) | |||
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] | |||
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync | |||
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync | |||
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync | |||
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync | |||
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106 | |||
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106 | |||
* 01:19 mutante: contint2001 - jenkins started again | |||
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:45 mutante: short maintenance on main contint server (jenkins) | |||
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance | |||
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance | |||
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance | |||
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance | |||
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance | |||
== | == 2023-05-23 == | ||
* | * 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done | ||
* | * 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance | ||
* | * 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance | ||
* | * 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance | ||
* | * 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]] | ||
* | * 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}} | ||
* | * 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}} | ||
* 05: | * 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 02: | * 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 21:02 TheresNoTime: close UTC late backport window | ||
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s) | |||
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] | |||
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s) | |||
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}} | |||
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] | |||
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001" | |||
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001" | |||
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox | |||
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027 | |||
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027 | |||
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026 | |||
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026 | |||
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025 | |||
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025 | |||
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025 | |||
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025 | |||
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024 | |||
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024 | |||
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024 | |||
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024 | |||
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024 | |||
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024 | |||
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023 | |||
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023 | |||
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022 | |||
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] | |||
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022 | |||
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]] | |||
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474` | |||
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule | |||
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability | |||
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]] | |||
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes | |||
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2 | |||
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001 | |||
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] | |||
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001 | |||
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s) | |||
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s) | |||
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]] | |||
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] | |||
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]] | |||
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye | |||
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage | |||
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | |||
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage | |||
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. | |||
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye | |||
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye | |||
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes | |||
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches | |||
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage | |||
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage | |||
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye | |||
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet | |||
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | |||
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | |||
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet | |||
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001" | |||
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | |||
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001" | |||
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors | |||
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors | |||
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001" | |||
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001" | |||
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox | |||
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox | |||
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet | |||
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet | |||
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet | |||
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001" | |||
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001" | |||
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox | |||
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet | |||
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001" | |||
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001" | |||
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors | |||
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors | |||
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001" | |||
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet | |||
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001" | |||
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s) | |||
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync | |||
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync | |||
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox | |||
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet | |||
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] | |||
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. | |||
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. | |||
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet | |||
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php | |||
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet | |||
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet | |||
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet | |||
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php | |||
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet | |||
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet | |||
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet | |||
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json | |||
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet | |||
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]] | |||
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet | |||
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet | |||
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json | |||
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]] | |||
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet | |||
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet | |||
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json | |||
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json | |||
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json | |||
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json | |||
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s) | |||
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] | |||
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json | |||
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json | |||
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet | |||
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet | |||
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json | |||
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json | |||
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s) | |||
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json | |||
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json | |||
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json | |||
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] | |||
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json | |||
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s) | |||
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] | |||
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s) | |||
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s) | |||
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] | |||
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] | |||
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json | |||
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json | |||
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json | |||
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]] | |||
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s) | |||
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json | |||
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]] | |||
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] | |||
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]] | |||
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s) | |||
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json | |||
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json | |||
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] | |||
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s) | |||
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] | |||
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json | |||
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json | |||
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json | |||
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json | |||
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]] | |||
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json | |||
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]] | |||
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]] | |||
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json | |||
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s) | |||
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json | |||
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] | |||
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json | |||
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]]) | |||
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json | |||
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json | |||
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s) | |||
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] (duration: 49m 04s) | |||
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] | |||
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}} | |||
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}} | |||
== | == 2023-05-22 == | ||
* 23:20 | * 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}} | ||
* | * 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s) | ||
* | * 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] | ||
* | * 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]] | ||
* | * 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet | ||
* | * 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | ||
* 00: | * 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | ||
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet | |||
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet | |||
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001" | |||
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet | |||
* 20:27 TheresNoTime: close UTC late backport window | |||
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s) | |||
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] | |||
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s) | |||
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet | |||
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet | |||
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] | |||
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s) | |||
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) | |||
* 16:58 XioNoX: push mgmt_junos to all L2 switches | |||
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet | |||
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet | |||
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet | |||
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet | |||
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary | |||
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary | |||
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]" | |||
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]" | |||
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm | |||
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage | |||
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage | |||
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm | |||
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm | |||
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage | |||
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json | |||
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage | |||
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm | |||
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]] | |||
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path | |||
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s) | |||
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet | |||
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002" | |||
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002" | |||
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors | |||
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors | |||
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002" | |||
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002" | |||
* 10:02 moritzm: installing updated usb.ids packages for Bullseye | |||
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet | |||
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet | |||
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002" | |||
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002" | |||
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors | |||
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors | |||
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002" | |||
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002" | |||
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet | |||
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] | |||
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated) | |||
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin | |||
* 08:22 moritzm: installing systemd security updates | |||
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json | |||
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json | |||
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events | |||
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json | |||
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json | |||
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json | |||
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json | |||
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad) | |||
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json | |||
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json | |||
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json | |||
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json | |||
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json | |||
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json | |||
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json | |||
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002 | |||
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json | |||
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet | |||
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json | |||
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002 | |||
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet | |||
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json | |||
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json | |||
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json | |||
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json | |||
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json | |||
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json | |||
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json | |||
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]] | |||
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]] | |||
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json | |||
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]] | |||
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json | |||
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json | |||
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json | |||
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json | |||
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json | |||
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json | |||
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json | |||
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json | |||
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json | |||
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json | |||
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json | |||
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json | |||
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json | |||
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json | |||
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]] | |||
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json | |||
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]] | |||
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]] | |||
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json | |||
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json | |||
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json | |||
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json | |||
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json | |||
== | == 2023-05-21 == | ||
* | * 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | ||
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* | |||
* | |||
* | |||
* | |||
* | |||
== | == 2023-05-20 == | ||
* | * 18:25 effie: restart varnish cp3061 | ||
* | * 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet | ||
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s) | |||
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] | |||
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet | |||
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001" | |||
* | * 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001" | ||
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
== | == 2023-05-19 == | ||
* | * 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | |||
* 21: | * 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | ||
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 21: | * 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet | ||
* 21: | * 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook) | ||
* | * 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet | ||
* 19: | * 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner | ||
* 19: | * 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet | ||
* 19: | * 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s) | ||
* 19: | * 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) | ||
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]] | |||
* | * 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s) | ||
* | * 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* | * 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] | ||
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s) | |||
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* | * 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad | ||
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] | |||
* | * 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | ||
* | * 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | ||
* | * 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | ||
* 14:58 | * 14:35 sukhe: enable puppet on A:lvs, finished rolling out change | ||
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566 | |||
* 14: | * 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update | ||
* 14: | * 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update | ||
* 14: | * 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s) | ||
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1 | |||
* | * 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided) | ||
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1 | |||
* | * 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]]) | ||
* | * 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | ||
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad | |||
* | * 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw | ||
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye | |||
* | * 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002 | ||
* | * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" | ||
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage | |||
* | * 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" | ||
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage | |||
* | * 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet | ||
* | * 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox | ||
* | * 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet | ||
* | * 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002 | ||
* | * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye | ||
* | * 10:07 moritzm: installing ncurses security updates | ||
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye | |||
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002" | |||
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw | |||
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002" | |||
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox | |||
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet | |||
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet | |||
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet | |||
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet | |||
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet | |||
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet | |||
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet | |||
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet | |||
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet | |||
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet | |||
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet | |||
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet | |||
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm | |||
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet | |||
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet | |||
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]] | |||
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet | |||
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet | |||
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet | |||
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json | |||
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json | |||
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json | |||
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage | |||
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage | |||
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json | |||
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json | |||
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json | |||
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet | |||
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet | |||
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json | |||
* 07:11 moritzm: installing emacs security updates | |||
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json | |||
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json | |||
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm | |||
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json | |||
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json | |||
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json | |||
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org | |||
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json | |||
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org | |||
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json | |||
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json | |||
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json | |||
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json | |||
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json | |||
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json | |||
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json | |||
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json | |||
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json | |||
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json | |||
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json | |||
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json | |||
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json | |||
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json | |||
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json | |||
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json | |||
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json | |||
== | == 2023-05-18 == | ||
* | * 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | ||
* 22: | * 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | ||
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress | |||
* 21: | * 22:20 mutante: short down-time for zuul-merger on contint2001 | ||
* 21: | * 21:47 mutante: maintenance for zuul (CI) on contint servers | ||
* 21: | * 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | ||
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s) | |||
* 21: | * 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | ||
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] | |||
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s) | |||
* | * 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | ||
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] | |||
* 20: | * 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | ||
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s) | |||
* 20: | * 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] | |||
* | * 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s) | ||
* | * 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] | ||
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* | * 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs [[phab:T330215|T330215]] | ||
* | * 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet | ||
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* | * 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | ||
* | * 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | ||
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* | * 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | ||
* | * 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | ||
* | * 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | ||
* | * 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | ||
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]] | |||
* | * 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]] | ||
* | * 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 18: | * 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* 18: | * 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* 18: | * 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 18: | * 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* | * 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* 17: | * 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 17: | * 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]] | ||
* | * 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | ||
* | * 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | ||
* | * 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye | ||
* | * 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | ||
* | * 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | ||
* 16: | * 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]] | ||
* | * 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage | ||
* | * 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage | ||
* | * 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s) | ||
* | * 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) | ||
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet | |||
* | * 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye | ||
* | * 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet | ||
* | * 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet | ||
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* | * 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet | ||
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet | |||
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* | * 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | ||
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* | * 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet | |||
* | * 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* | * 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet | ||
* | * 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet | ||
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet | |||
* | * 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet | ||
* | * 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s) | ||
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) | |||
* | * 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | ||
* | * 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | ||
* | * 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | ||
* | * 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | ||
* | * 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet | ||
* | * 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | ||
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw | |||
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw | |||
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 13:18 TheresNoTime: closing backport window | |||
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s) | |||
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] | |||
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s) | |||
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s) | |||
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it | |||
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet | |||
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet | |||
* 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet | |||
* 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet | |||
* 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet | |||
* 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet | |||
* 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet | |||
* 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet | |||
* 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet | |||
* 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet | |||
* 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers ([[phab:T320508|T320508]]) | |||
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet | |||
* 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet | |||
* 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 ([[phab:T336483|T336483]]) | |||
* 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet | |||
* 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet | |||
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet | |||
* 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet | |||
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet | |||
* 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk | |||
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk | |||
* 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet | |||
* 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet | |||
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet | |||
* 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | |||
* 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts | |||
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet | |||
* 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2 | |||
* 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet | |||
* 07:25 apergos: UTC morning backport and config training window done | |||
* 07:15 kartik@deploy1002: Finished scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 18s) | |||
* 07:07 kartik@deploy1002: kartik: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:06 kartik@deploy1002: Started scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] | |||
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance | |||
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance | |||
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl [[phab:T336833|T336833]]', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json | |||
* 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance | |||
* 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance | |||
== | == 2023-05-17 == | ||
* | * 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001" | ||
* 22: | * 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001" | ||
* 22: | * 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 22:15 krinkle@deploy1002: Synchronized wmf-config/: [[phab:T332012|T332012]] (duration: 06m 51s) | ||
* 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet | |||
* 21: | * 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye | ||
* 21: | * 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye | ||
* | * 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request [[:phab:T333842{{!}}T333842]]" | ||
* | * 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet | ||
* 20: | * 20:32 urbanecm: UTC late B&C window done | ||
* 20: | * 20:29 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] (duration: 11m 36s) | ||
* 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw. | |||
* 20:17 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] | |||
* | * 20:15 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] (duration: 12m 06s) | ||
* 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* | * 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* | * 20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] | ||
* | * 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | ||
* | * 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | ||
* | * 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet | ||
* 19:50 ejegg: payments-wiki upgraded from {{Gerrit|8988a598}} to {{Gerrit|a7567c6a}} | |||
* | * 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update [[phab:T331297|T331297]] | ||
* | * 19:01 Amir1: Removing db1112 from zarcillo [[phab:T336332|T336332]] | ||
* | * 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet | ||
* | * 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | ||
* | * 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | ||
* | * 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet | ||
* 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] (duration: 06m 22s) | |||
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* | * 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s) | ||
* 18:03 brennen: train 1.41.0-wmf.9 ([[phab:T330215|T330215]]): no current blockers, rolling to group1 as backup-backup conductor | |||
* 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] | |||
* | * 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | * 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* | * 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync | ||
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync | |||
* | * 17:19 brett: Maglev LVS scheduler rollout finished in esams - [[phab:T263797|T263797]] | ||
* | * 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for [[phab:T244570|T244570]] | ||
* | * 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | ||
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json | |||
* | * 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | ||
* | * 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json | ||
* | * 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | ||
* 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json | |||
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json | |||
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json | |||
* 15:52 brett: Rolling out maglev LVS scheduler in esams - [[phab:T263797|T263797]] | |||
* 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json | |||
* 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json | |||
* 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance | |||
* 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance | |||
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance | |||
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json | |||
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org | |||
* 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org | |||
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org | |||
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json | |||
* 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org | |||
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json | |||
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet | |||
* 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet | |||
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json | |||
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json | |||
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs | |||
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json | |||
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json | |||
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json | |||
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance | |||
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance | |||
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json | |||
* 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance | |||
* 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 06m 20s) | |||
* 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance | |||
* 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker | |||
* 14:36 moritzm: installing jackson-databind security updates | |||
* 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] (duration: 00m 09s) | |||
* 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] | |||
* 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] | |||
* 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync | |||
* 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync | |||
* 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync | |||
* 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - [[phab:T336817|T336817]] | |||
* 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - [[phab:T336726|T336726]] | |||
* 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs | |||
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs | |||
* 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 07m 30s) | |||
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet | |||
* 13:59 taavi@deploy1002: Finished scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] (duration: 07m 24s) | |||
* 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet | |||
* 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet | |||
* 13:54 taavi@deploy1002: matmarex and taavi: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:52 taavi@deploy1002: Started scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] | |||
* 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet | |||
* 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet | |||
* 13:47 taavi@deploy1002: Finished scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] (duration: 08m 11s) | |||
* 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs | |||
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs | |||
* 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet | |||
* 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet | |||
* 13:40 taavi@deploy1002: taavi and maurelio: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:38 taavi@deploy1002: Started scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] | |||
* 13:38 taavi@deploy1002: Finished scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] (duration: 07m 39s) | |||
* 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet | |||
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet | |||
* 13:32 taavi@deploy1002: stang and taavi: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:30 taavi@deploy1002: Started scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] | |||
* 13:29 taavi@deploy1002: Finished scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] (duration: 09m 15s) | |||
* 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs | |||
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet | |||
* 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet | |||
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs | |||
* 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker | |||
* 13:20 taavi@deploy1002: Started scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] | |||
* 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 13:18 daniel@deploy1002: Finished scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] (duration: 11m 52s) | |||
* 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs | |||
* 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet | |||
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary | |||
* 13:07 daniel@deploy1002: daniel: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:06 daniel@deploy1002: Started scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] | |||
* 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet | |||
* 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary | |||
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json | |||
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json | |||
* 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet | |||
* 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet | |||
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001" | |||
* 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001" | |||
* 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet | |||
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json | |||
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json | |||
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json | |||
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json | |||
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json | |||
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json | |||
* 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces | |||
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json | |||
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json | |||
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json | |||
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance | |||
* 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance | |||
* 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json | |||
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json | |||
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json | |||
* 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json | |||
* 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance | |||
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json | |||
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json | |||
* 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json | |||
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json | |||
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json | |||
* 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json | |||
* 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json | |||
* 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json | |||
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json | |||
* 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply | |||
* 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply | |||
* 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply | |||
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json | |||
* 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply | |||
* 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply | |||
* 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply | |||
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json | |||
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply | |||
* 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json | |||
* 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance | |||
* 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply | |||
* 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance | |||
* 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply | |||
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json | |||
* 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json | |||
* 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json | |||
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json | |||
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json | |||
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json | |||
* 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json | |||
* 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json | |||
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json | |||
* 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json | |||
* 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json | |||
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json | |||
* 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json | |||
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json | |||
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance | |||
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance | |||
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json | |||
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json | |||
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json | |||
* 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance | |||
* 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance | |||
* 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - [[phab:T336726|T336726]] | |||
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json | |||
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json | |||
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json | |||
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json | |||
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json | |||
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json | |||
* 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet | |||
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet | |||
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json | |||
* 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json | |||
* 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:48 moritzm: upgrading krb1001 to Bullseye [[phab:T331695|T331695]] | |||
* 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye | |||
* 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye | |||
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json | |||
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468 | |||
* 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468 | |||
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json | |||
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json | |||
* 07:19 kartik@deploy1002: Finished scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] (duration: 07m 22s) | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json | |||
* 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json | |||
* 07:11 kartik@deploy1002: Started scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] | |||
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json | |||
* 07:09 kartik@deploy1002: Backport cancelled. | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json | |||
* 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply | |||
* 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply | |||
* 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply | |||
* 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply | |||
* 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply | |||
* 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json | |||
* 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply | |||
* 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply | |||
* 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply | |||
* 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply | |||
* 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply | |||
* 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json | |||
* 06:01 volans: restarted ferm on ms-be1047 | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json | |||
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json | |||
* 05:16 marostegui: Optimize s7 on dbstore1003 [[phab:T336733|T336733]] | |||
* 00:21 krinkle@deploy1002: Synchronized src/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 01s) | |||
* 00:15 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 14s) | |||
* 00:07 krinkle@deploy1002: Synchronized lib/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 51s) | |||
== | == 2023-05-16 == | ||
* | * 20:59 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] (duration: 07m 18s) | ||
* | * 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | ||
* 20:52 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] | |||
* 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 20:49 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] (duration: 09m 19s) | |||
* 20:41 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:39 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] | |||
* 20:36 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] (duration: 07m 44s) | |||
* 20:30 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - [[phab:T263797|T263797]] | |||
* 20:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] | |||
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 19:04 sukhe: dummry run of authdns-update to confirm new hosts | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw | |||
* 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw | |||
* 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org | |||
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.* | |||
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.* | |||
* 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:47 ryankemper: [WDQS] Pooled `wdqs2012` | |||
* 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled) | |||
* 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": [[phab:T335777|T335777]] | |||
* 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: [[phab:T326688|T326688]] | |||
* 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": [[phab:T326688|T326688]] | |||
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye | |||
* 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage | |||
* 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage | |||
* 18:01 sukhe: enable puppet on A:cp-text | |||
* 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply | |||
* 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply | |||
* 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply | |||
* 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply | |||
* 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply | |||
* 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply | |||
* 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye | |||
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:40 moritzm: installing avahi security updates on buster | |||
* 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s) | |||
* 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] | |||
* 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:27 brett: Rolling out maglev LVS scheduler in drmrs - [[phab:T263797|T263797]] | |||
* 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org | |||
* 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" [[phab:T335777|T335777]] | |||
* 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts | |||
* 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet | |||
* 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help ) | |||
* 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container | |||
* 15:49 sukhe: run authdns-update for CR 920314 | |||
* 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s) | |||
* 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] | |||
* 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now [[phab:T336775|T336775]] | |||
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: [[phab:T326688|T326688]] | |||
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ] | |||
* 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | |||
* 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply | |||
* 15:27 hashar: Restarting CI Jenkins | |||
* 15:26 Emperor: rebalance codfw swift rings [[phab:T335280|T335280]] | |||
* 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/ | |||
* 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:49 moritzm: installing libxml2 security updates on buster | |||
* 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]] | |||
* 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:43 hashar: Restarting CI Jenkins | |||
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]] | |||
* 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync | |||
* 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync | |||
* 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye | |||
* 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s) | |||
* 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) | |||
* 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]] | |||
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage | |||
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage | |||
* 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]] | |||
* 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad | |||
* 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 13:46 Emperor: repool ms-fe2012 [[phab:T335042|T335042]] | |||
* 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad | |||
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet | |||
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet | |||
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web | |||
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web | |||
* 13:32 taavi@deploy1002: Finished scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] (duration: 09m 08s) | |||
* 13:32 Emperor: repool thanos-fe2003 [[phab:T335042|T335042]] | |||
* 13:30 sukhe: running authdns-update to repool codfw | |||
* 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org | |||
* 13:25 taavi@deploy1002: mazevedo and taavi: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]] | |||
* 13:23 taavi@deploy1002: Started scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] | |||
* 13:01 XioNoX: asw-d-codfw> request system reboot all-members - [[phab:T335042|T335042]] | |||
* 12:52 Emperor: depool ms-fe2012 [[phab:T335042|T335042]] | |||
* 12:51 Emperor: depool thanos-fe2003 [[phab:T335042|T335042]] | |||
* 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]] | |||
* 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade | |||
* 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade | |||
* 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet | |||
* 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however | |||
* 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet | |||
* 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - [[phab:T336740|T336740]] | |||
* 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s) | |||
* 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] | |||
* 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s) | |||
* 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] | |||
* 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]] | |||
* 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]] | |||
* 12:21 XioNoX: disable ping offload in codfw - [[phab:T335042|T335042]] | |||
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s) | |||
* 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] | |||
* 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:59 kart_: Updated cxserver to 2023-05-16-061239-production ([[phab:T336657|T336657]]) | |||
* 11:57 XioNoX: stage upgrade on asw-d-codfw - [[phab:T335042|T335042]] | |||
* 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s) | |||
* 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply | |||
* 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply | |||
* 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw | |||
* 11:50 marostegui: install 10.4.29 on db1151 [[phab:T336462|T336462]] | |||
* 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply | |||
* 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply | |||
* 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw | |||
* 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] | |||
* 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply | |||
* 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply | |||
* 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm | |||
* 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet | |||
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance | |||
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance | |||
* 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance | |||
* 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however | |||
* 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm | |||
* 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm | |||
* 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet | |||
* 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm | |||
* 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet | |||
* 11:00 moritzm: updated bookworm image to RC3 [[phab:T330495|T330495]] | |||
* 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet | |||
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet | |||
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet | |||
* 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet | |||
* 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet | |||
* 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]] | |||
* 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet | |||
* 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade | |||
* 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade | |||
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" | |||
* 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - [[phab:T317799|T317799]] | |||
* 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" | |||
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]] | |||
* 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox | |||
* 10:13 Amir1: cleaning up echo notification table in all wikis ([[phab:T318523|T318523]]) | |||
* 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s) | |||
* 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) | |||
* 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet | |||
* 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet | |||
* 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet | |||
* 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner | |||
* 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts | |||
* 09:21 marostegui: Optimize s5 on dbstore1003 [[phab:T336733|T336733]] | |||
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance | |||
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance | |||
* 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org | |||
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance | |||
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance | |||
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance | |||
* 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner | |||
* 07:28 Emperor: restart vopsbot.service on alert1001 | |||
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json | |||
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json | |||
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 06:56 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] (duration: 06m 58s) | |||
* 06:51 marostegui@deploy1002: marostegui: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 06:50 eileen: civicrm: revision {{Gerrit|d97a371e}}, config {{Gerrit|686d3cb4}} | |||
* 06:49 marostegui@deploy1002: Started scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] | |||
* 06:49 _joe_: running docker image prune -a in build2001 | |||
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json | |||
* 06:24 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] (duration: 07m 08s) | |||
* 06:24 eileen: civicrm upgraded from {{Gerrit|ef7b3822}} to {{Gerrit|d97a371e}} | |||
* 06:18 marostegui@deploy1002: marostegui: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 06:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json | |||
* 06:05 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] (duration: 07m 21s) | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json | |||
* 05:59 marostegui@deploy1002: marostegui: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json | |||
* 05:58 marostegui@deploy1002: Started scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] | |||
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json | |||
* 05:43 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] (duration: 07m 15s) | |||
* 05:38 marostegui@deploy1002: marostegui: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 05:36 marostegui@deploy1002: Started scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] | |||
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json | |||
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json | |||
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s) | |||
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] (duration: 48m 47s) | |||
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
== | == 2023-05-15 == | ||
* | * 23:37 eileen: civicrm upgraded from {{Gerrit|db6e8d69}} to {{Gerrit|ef7b3822}} | ||
* 19: | * 22:02 maryum: deployed patch for [[phab:T323651|T323651]] | ||
* 19: | * 21:51 maryum: Deployed patch for [[phab:T335612|T335612]] | ||
* 16:47 | * 21:42 ejegg: payments-wiki upgraded from {{Gerrit|c0da741f}} to {{Gerrit|8988a598}} (and globalcollect settings deleted) | ||
* | * 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | ||
* | * 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet | ||
* 01: | * 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet | ||
* | * 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | ||
* | * 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | ||
* | * 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet | ||
* 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet | |||
* 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance | |||
* 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance | |||
* 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet | |||
* 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet | |||
* 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet | |||
* 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet | |||
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s) | |||
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS | |||
* 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet | |||
* 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet | |||
* 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s) | |||
* 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s) | |||
* 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s) | |||
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet | |||
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet | |||
* 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s) | |||
* 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet | |||
* 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124 | |||
* 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - [[phab:T336434|T336434]] | |||
* 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] [[phab:T335042|T335042]] | |||
* 18:33 brett: Rolling out maglev LVS scheduler in eqsin - [[phab:T263797|T263797]] | |||
* 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye | |||
* 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye | |||
* 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet |