You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.) |
imported>Stashbot (razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch) |
||
Line 1: | Line 1: | ||
== 2022-03-03 == | |||
* 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch | |||
* 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch | |||
* 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet | |||
* 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox | |||
* 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet | |||
== 2022-03-02 == | == 2022-03-02 == | ||
* 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage | |||
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage | |||
* 23:25 ryankemper: [[phab:T276198|T276198]] Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from [[phab:T276198|T276198]]"'` | |||
* 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 23:21 ryankemper: [[phab:T276198|T276198]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly | |||
* 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 23:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 22:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply | |||
* 22:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply | |||
* 22:55 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 22:54 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 22:54 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply | |||
* 22:52 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply | |||
* 22:52 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply | |||
* 22:51 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply | |||
* 22:51 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply | |||
* 22:50 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply | |||
* 22:50 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply | |||
* 22:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply | |||
* 22:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply | |||
* 22:48 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply | |||
* 22:48 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply | |||
* 22:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply | |||
* 22:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply | |||
* 22:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply | |||
* 22:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply | |||
* 22:45 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply | |||
* 22:45 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply | |||
* 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply | |||
* 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply | |||
* 22:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply | |||
* 22:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply | |||
* 22:41 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply | |||
* 22:21 ryankemper: [[phab:T276198|T276198]] Downtimed `elastic1052` for 2 hours while troubleshooting | |||
* 22:16 ryankemper: [[phab:T276198|T276198]] Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing. | |||
* 22:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 21:59 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: [[gerrit:767104{{!}}Hooks.php: Check for non-array $tags (T302918)]] (duration: 00m 50s) | |||
* 21:53 ryankemper: [[phab:T276198|T276198]] Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host | |||
* 21:44 mutante: rolling out scap 4.4.2 on 'all' [[phab:T302919|T302919]] | |||
* 21:36 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 21:19 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767580{{!}}wmf-config: Undeploy the fawiki test survey from production (T300291)]] (duration: 00m 50s) | |||
* 21:13 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 21:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing scap 4.4.2 | |||
* 21:05 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye | |||
* 21:00 mutante: deploy1002 - upgraded scap to 4.4.2-1 [[phab:T302919|T302919]] | |||
* 20:48 mutante: running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say [[phab:T302919|T302919]] | |||
* 20:48 dzahn@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s) | |||
* 20:47 dzahn@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) | |||
* 20:44 mutante: testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary ([[phab:T302919|T302919]]) | |||
* 20:38 mutante: rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary ([[phab:T302919|T302919]]) | |||
* 20:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 20:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox | |||
* 19:57 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided) | |||
* 19:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided) | |||
* 19:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:46 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: [[gerrit:767103{{!}}Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907)]] (duration: 00m 50s) | |||
* 19:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 19:36 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:33 robh@cumin1001: START - Cookbook sre.dns.netbox | |||
* 19:30 mutante: stopped icinga-wm | |||
* 19:14 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs [[phab:T300200|T300200]] (duration: 00m 50s) | |||
* 19:13 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs [[phab:T300200|T300200]] | |||
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json | |||
* 19:10 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to group1 | |||
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json | |||
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json | |||
* 18:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json | |||
* 18:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json | |||
* 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance | |||
* 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance | |||
* 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json | |||
* 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply | |||
* 18:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply | |||
* 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply | |||
* 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply | |||
* 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply | |||
* 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply | |||
* 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply | |||
* 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply | |||
* 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply | |||
* 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply | |||
* 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply | |||
* 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply | |||
* 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply | |||
* 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply | |||
* 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json | |||
* 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json | |||
* 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json | |||
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json | |||
* 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance | |||
* 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance | |||
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json | |||
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json | |||
* 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json | |||
* 16:51 vgutierrez: pool cp3061 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]] | |||
* 16:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster | |||
* 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json | |||
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json | |||
* 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance | |||
* 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance | |||
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json | |||
* 16:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage | |||
* 16:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage | |||
* 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json | |||
* 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json | |||
* 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster | |||
* 15:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster | |||
* 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json | |||
* 15:47 vgutierrez: pool cp5014 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]] | |||
* 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet | |||
* 15:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet | |||
* 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json | |||
* 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance | |||
* 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance | |||
* 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json | |||
* 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json | |||
* 15:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage | |||
* 15:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage | |||
* 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json | |||
* 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json | |||
* 14:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster | |||
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json | |||
* 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance | |||
* 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance | |||
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json | |||
* 14:41 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:38 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json | |||
* 14:34 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:27 moritzm: rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones) | |||
* 14:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster | |||
* 14:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: [[gerrit:767099{{!}}ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop]] (duration: 00m 52s) | |||
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json | |||
* 14:13 mmandere: pool cp6013 | |||
* 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json | |||
* 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json | |||
* 14:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance | |||
* 14:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance | |||
* 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json | |||
* 13:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster | |||
* 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json | |||
* 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json | |||
* 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json | |||
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json | |||
* 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance | |||
* 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance | |||
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json | |||
* 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json | |||
* 12:47 reedy@deploy1002: Finished scap: Fix MassMessage translations [[phab:T302840|T302840]] (duration: 01m 50s) | |||
* 12:45 reedy@deploy1002: Started scap: Fix MassMessage translations [[phab:T302840|T302840]] | |||
* 12:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster | |||
* 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json | |||
* 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json | |||
* 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json | |||
* 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance | |||
* 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance | |||
* 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance | |||
* 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance | |||
* 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | |||
* 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | |||
* 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json | |||
* 12:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster | |||
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json | |||
* 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json | |||
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json | |||
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json | |||
* 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | |||
* 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | |||
* 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | |||
* 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | |||
* 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | |||
* 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | |||
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json | |||
* 11:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s) | |||
* 11:22 mbsantos: rollback maps eqiad to a previous working state to mitigate geoshape errors | |||
* 11:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" | |||
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json | |||
* 11:05 moritzm: installing expat security updates | |||
* 10:56 moritzm: restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update | |||
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json | |||
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json | |||
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json | |||
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance | |||
* 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance | |||
* 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 10:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply | |||
* 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply | |||
* 10:15 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply | |||
* 10:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s) | |||
* 10:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply | |||
* 10:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" | |||
* 10:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s) | |||
* 10:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" | |||
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json | |||
* 10:04 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet | |||
* 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply | |||
* 09:55 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply | |||
* 09:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json | |||
* 09:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s) | |||
* 09:49 klausman@cumin2002: START - Cookbook sre.dns.netbox | |||
* 09:49 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet | |||
* 09:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet | |||
* 09:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging | |||
* 09:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s) | |||
* 09:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging | |||
* 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json | |||
* 09:35 klausman@cumin2002: START - Cookbook sre.dns.netbox | |||
* 09:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet | |||
* 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json | |||
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json | |||
* 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json | |||
* 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance | |||
* 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance | |||
* 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json | |||
* 09:16 mmandere: rolling restart of varnishkafka-* on cp6* | |||
* 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json | |||
* 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json | |||
* 09:05 XioNoX: push Capirca managed labs-in firewall filter to eqiad routers | |||
* 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json | |||
* 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json | |||
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json | |||
* 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye | |||
* 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json | |||
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json | |||
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance | |||
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance | |||
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json | |||
* 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage | |||
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage | |||
* 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json | |||
* 08:09 godog: test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed | |||
* 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye | |||
* 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json | |||
* 08:02 Amir1: killing all entity dumpers of wikidata in snapshot1008 ([[phab:T300255|T300255]]) | |||
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json | |||
* 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json | |||
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance | |||
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance | |||
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance | |||
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance | |||
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance | |||
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance | |||
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json | |||
* 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance | |||
* 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance | |||
* 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json | |||
* 07:35 _joe_: filling request patterns in etcd | |||
* 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json | |||
* 07:09 _joe_: installing scap 4.4.1 everywhere [[phab:T302464|T302464]] | |||
* 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json | |||
* 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json | |||
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json | |||
* 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json | |||
* 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json | |||
* 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json | |||
* 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye | |||
* 05:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage | |||
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage | |||
* 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye | |||
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json | |||
* 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json | |||
* 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json | |||
* 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json | |||
* 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json | |||
* 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json | |||
* 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json | |||
* 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json | |||
* 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json | |||
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json | |||
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance | |||
* 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance | |||
* 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json | |||
* 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json | |||
* 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json | |||
* 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye | |||
* 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json | |||
* 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage | |||
* 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage | |||
* 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye | |||
* 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json | |||
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json | |||
* 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance | |||
* 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance | |||
* 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json | |||
* 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance | |||
* 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance | |||
* 03:43 ejegg: updated CiviCRM from {{Gerrit|e9f0eff5}} to {{Gerrit|cb0605ed}} | |||
* 02:13 ejegg: Fundraising CiviCRM updated from {{Gerrit|2874d623}} to {{Gerrit|e9f0eff5}} | |||
* 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo. | * 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo. | ||
* 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo. | * 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo. |
Revision as of 01:42, 3 March 2022
2022-03-03
- 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
- 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
- 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet
- 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox
- 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet
2022-03-02
- 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
- 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
- 23:25 ryankemper: T276198 Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from T276198"'`
- 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 23:21 ryankemper: T276198 https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
- 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 23:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 22:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 22:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 22:55 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 22:54 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 22:54 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 22:52 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
- 22:52 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 22:51 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 22:51 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 22:50 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 22:50 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 22:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 22:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
- 22:48 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
- 22:48 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 22:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 22:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 22:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 22:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 22:45 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 22:45 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
- 22:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
- 22:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
- 22:41 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
- 22:21 ryankemper: T276198 Downtimed `elastic1052` for 2 hours while troubleshooting
- 22:16 ryankemper: T276198 Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing.
- 22:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 21:59 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: Hooks.php: Check for non-array $tags (T302918) (duration: 00m 50s)
- 21:53 ryankemper: T276198 Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host
- 21:44 mutante: rolling out scap 4.4.2 on 'all' T302919
- 21:36 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 21:19 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wmf-config: Undeploy the fawiki test survey from production (T300291) (duration: 00m 50s)
- 21:13 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 21:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing scap 4.4.2
- 21:05 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
- 21:00 mutante: deploy1002 - upgraded scap to 4.4.2-1 T302919
- 20:48 mutante: running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say T302919
- 20:48 dzahn@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s)
- 20:47 dzahn@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
- 20:44 mutante: testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary (T302919)
- 20:38 mutante: rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary (T302919)
- 20:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
- 19:57 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
- 19:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
- 19:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 19:46 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907) (duration: 00m 50s)
- 19:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 19:36 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:33 robh@cumin1001: START - Cookbook sre.dns.netbox
- 19:30 mutante: stopped icinga-wm
- 19:14 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300200 (duration: 00m 50s)
- 19:13 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300200
- 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json
- 19:10 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group1
- 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json
- 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json
- 18:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json
- 18:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json
- 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
- 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
- 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json
- 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 18:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
- 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
- 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
- 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
- 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
- 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json
- 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json
- 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json
- 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json
- 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
- 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
- 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json
- 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json
- 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json
- 16:51 vgutierrez: pool cp3061 running HAProxy as TLS termination layer - T290005 T271421
- 16:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster
- 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json
- 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json
- 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json
- 16:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
- 16:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
- 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json
- 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json
- 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster
- 15:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster
- 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json
- 15:47 vgutierrez: pool cp5014 running HAProxy as TLS termination layer - T290005 T271421
- 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 15:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json
- 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json
- 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json
- 15:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
- 15:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
- 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json
- 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json
- 14:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster
- 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json
- 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
- 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
- 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json
- 14:41 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
- 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
- 14:38 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
- 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
- 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json
- 14:34 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
- 14:27 moritzm: rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones)
- 14:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
- 14:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
- 14:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop (duration: 00m 52s)
- 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json
- 14:13 mmandere: pool cp6013
- 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json
- 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json
- 14:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 14:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json
- 13:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
- 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json
- 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json
- 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json
- 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json
- 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
- 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
- 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json
- 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json
- 12:47 reedy@deploy1002: Finished scap: Fix MassMessage translations T302840 (duration: 01m 50s)
- 12:45 reedy@deploy1002: Started scap: Fix MassMessage translations T302840
- 12:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
- 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json
- 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json
- 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json
- 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
- 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
- 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
- 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
- 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
- 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
- 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json
- 12:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
- 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json
- 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json
- 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json
- 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json
- 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
- 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
- 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json
- 11:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s)
- 11:22 mbsantos: rollback maps eqiad to a previous working state to mitigate geoshape errors
- 11:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
- 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json
- 11:05 moritzm: installing expat security updates
- 10:56 moritzm: restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update
- 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json
- 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json
- 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json
- 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
- 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
- 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 10:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 10:15 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 10:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s)
- 10:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 10:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging"
- 10:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s)
- 10:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging"
- 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json
- 10:04 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet
- 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 09:55 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 09:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json
- 09:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s)
- 09:49 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 09:49 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet
- 09:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet
- 09:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging
- 09:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s)
- 09:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging
- 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json
- 09:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 09:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet
- 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
- 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json
- 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json
- 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
- 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
- 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json
- 09:16 mmandere: rolling restart of varnishkafka-* on cp6*
- 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json
- 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json
- 09:05 XioNoX: push Capirca managed labs-in firewall filter to eqiad routers
- 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json
- 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json
- 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
- 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye
- 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json
- 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json
- 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
- 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
- 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json
- 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
- 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
- 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json
- 08:09 godog: test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed
- 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye
- 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json
- 08:02 Amir1: killing all entity dumpers of wikidata in snapshot1008 (T300255)
- 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json
- 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json
- 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
- 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
- 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
- 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
- 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
- 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
- 07:35 _joe_: filling request patterns in etcd
- 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json
- 07:09 _joe_: installing scap 4.4.1 everywhere T302464
- 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json
- 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
- 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
- 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json
- 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json
- 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
- 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye
- 05:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
- 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
- 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye
- 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
- 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json
- 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
- 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
- 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
- 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
- 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json
- 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json
- 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json
- 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json
- 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json
- 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json
- 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
- 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
- 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json
- 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
- 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json
- 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye
- 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json
- 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
- 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
- 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye
- 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json
- 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
- 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
- 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
- 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json
- 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 03:43 ejegg: updated CiviCRM from e9f0eff5 to cb0605ed
- 02:13 ejegg: Fundraising CiviCRM updated from 2874d623 to e9f0eff5
- 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
- 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
2022-03-01
- 22:51 inflatador: T276198 reenabled puppet on elastic1052.eqiad.wmnet
- 22:37 inflatador: T276198 rebooting elastic1052.eqiad.wmnet to test failure condition
- 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
- 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
- 22:32 inflatador: T276198 disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
- 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
- 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
- 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
- 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24 refs T300200
- 20:33 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
- 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643) (duration: 00m 55s)
- 20:05 mutante: alert1001 - re-enabled puppet
- 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs T300200 (duration: 53m 17s)
- 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
- 19:38 mutante: mw1449 - scap pull
- 19:36 mutante: mw1414 - scap pull
- 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24 refs T300200
- 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
- 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 18:57 brennen: 1.38.0-wmf.24 train (T300200): there's currently a single blocker at T302643; staging to testwikis and holding there until backport's available
- 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
- 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
- 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
- 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
- 17:52 cwhite: completed grafana upgrade in eqiad T282863
- 17:50 herron: re-enabling puppet and ircecho on alert1001
- 17:47 cwhite: upgrade grafana in eqiad T282863
- 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
- 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
- 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
- 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
- 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
- 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
- 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 16:55 dancy@deploy1002: Started scap: testing container image build
- 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
- 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
- 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
- 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s)
- 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
- 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195
- 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
- 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
- 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
- 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
- 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
- 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
- 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
- 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
- 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - T290005 T271421
- 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
- 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
- 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
- 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
- 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
- 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
- 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
- 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:53 mmandere: restart purged on cp60[15-16]
- 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
- 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
- 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
- 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
- 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
- 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:40 kormat: Deploying wmfmariadbpy 0.9 T302796
- 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o T302796
- 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
- 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
- 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
- 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
- 13:15 XioNoX: restart cr1-drmrs for software upgrade
- 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
- 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - T290005 T271421
- 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
- 12:39 moritzm: installing expat security updates
- 12:34 mmandere: restart purged on cp60[12-14]
- 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
- 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
- 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
- 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
- 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
- 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
- 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
- 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
- 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
- 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
- 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
- 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843
- 11:18 _joe_: also removed the ipvsadm entry for apaches:80 T244843
- 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - T302744
- 11:17 _joe_: restarting pybal on lvs1020 T244843
- 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
- 11:11 _joe_: restarted pybal on lvs2009, T244843
- 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
- 11:07 _joe_: restarted pybal on lvs2010, T244843
- 11:02 mmandere: restart purged on cp60[09,10,11]
- 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
- 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
- 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
- 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
- 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
- 10:31 mmandere: restart purged on cp600[6-8]
- 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - T290005 T271421
- 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
- 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
- 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
- 09:31 _joe_: restart pybal on lvs1020
- 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
- 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
- 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
- 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
- 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
- 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
- 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820
- 09:20 _joe_: restarted pybal on lvs2010
- 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
- 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
- 09:06 elukey: restart purged on cp6005
- 08:57 elukey: restart purged on cp6004
- 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
- 08:27 urbanecm: UTC morning B&C window done
- 08:25 elukey: restart purged on cp6003
- 08:16 moritzm: drain instances off ganeti2008 for eventual decom
- 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: d149208: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s)
- 07:59 elukey: restart purged on cp6002
- 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s)
- 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test
- 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
- 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464
- 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye}
- 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
- 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
- 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
- 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
- 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
- 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
- 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
- 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171