You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .) |
imported>Stashbot (catrope@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/: Backport: Do not load common.js twice (T300070) and Fix bug in SkinVersionLookup (T299971) (duration: 00m 51s)) |
||
(106 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== | == 2022-01-26 == | ||
* | * 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/: Backport: [[gerrit:756997{{!}}Do not load common.js twice (T300070)]] and [[gerrit:756696{{!}}Fix bug in SkinVersionLookup (T299971)]] (duration: 00m 51s) | ||
* | * 01:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 00:56 catrope@deploy1002: Synchronized php-1.38.0-wmf.19/skins/Vector/: Backport: [[gerrit:756998{{!}}Do not load common.js twice (T300070)]] (duration: 02m 43s) | ||
* 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:11 ryankemper: [[phab:T294805|T294805]] Reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003 (elasticsearch-oss dependency issues, will pick this back up tomorrow); re-enabling puppet across elastic1* | |||
* 00:03 ryankemper: [[phab:T294805|T294805]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003; running puppet on `elastic1068` to make it join the fleet | |||
== | == 2022-01-25 == | ||
* | * 23:42 ryankemper: [[phab:T294805|T294805]] [Elastic] Step 2: Disabling puppet in advance of merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/736117 | ||
* 20: | * 23:20 ryankemper: [[phab:T294805|T294805]] [Elastic] Merged https://gerrit.wikimedia.org/r/736116, step 1 of bringing new eqiad 10G refresh hosts into service | ||
* | * 21:20 bblack@cumin1001: conftool action : set/weight=100; selector: dc=drmrs,service=ats-be | ||
* | * 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=varnish-fe | ||
* | * 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=ats-tls | ||
* | * 21:03 cwhite: end transition to logstash output opensearch plugin [[phab:T299168|T299168]] | ||
* | * 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 18 | * 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:17 cwhite: begin transition to logstash output opensearch plugin [[phab:T299168|T299168]] | ||
* | * 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.19 refs [[phab:T293960|T293960]] | ||
* | * 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS buster | ||
* | * 20:01 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): testwiki sync finished, still no open blockers, proceeding to group0 | ||
* | * 19:50 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.19 refs [[phab:T293960|T293960]] (duration: 52m 01s) | ||
* | * 19:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster | ||
* | * 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:35 cmjohnson1: updating firmware ganeti1006 [[phab:T299527|T299527]] | ||
* | * 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1028 master of es3 [[phab:T299911|T299911]]', diff saved to https://phabricator.wikimedia.org/P19221 and previous config saved to /var/cache/conftool/dbconfig/20220125-191238-ladsgroup.json | ||
* 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19220 and previous config saved to /var/cache/conftool/dbconfig/20220125-190949-ladsgroup.json | |||
* | * 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | ||
* 19:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | |||
* | * 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:58 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.19 refs [[phab:T293960|T293960]] | ||
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* | * 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19219 and previous config saved to /var/cache/conftool/dbconfig/20220125-185444-ladsgroup.json | ||
* | * 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19218 and previous config saved to /var/cache/conftool/dbconfig/20220125-184714-root.json | ||
* | * 18:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet | ||
* | * 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19217 and previous config saved to /var/cache/conftool/dbconfig/20220125-183940-ladsgroup.json | ||
* | * 18:38 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync on production | ||
* | * 18:34 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply on production | ||
* | * 18:33 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: sync on production | ||
* | * 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19216 and previous config saved to /var/cache/conftool/dbconfig/20220125-183210-root.json | ||
* | * 18:31 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply on production | ||
* | * 18:30 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: sync on production | ||
* | * 18:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply on production | ||
* | * 18:28 moritzm: installing policykit-1 security updates on buster | ||
* | * 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19215 and previous config saved to /var/cache/conftool/dbconfig/20220125-182435-ladsgroup.json | ||
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1028.eqiad.wmnet with OS bullseye | |||
* | * 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19214 and previous config saved to /var/cache/conftool/dbconfig/20220125-181706-root.json | ||
* 18:14 brennen: train 1.38.0-wmf.19 ([[phab:T293960|T293960]]): no open blockers, starting stage-train script shortly | |||
* | * 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19213 and previous config saved to /var/cache/conftool/dbconfig/20220125-180203-root.json | ||
* | * 18:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 17:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19212 and previous config saved to /var/cache/conftool/dbconfig/20220125-174659-root.json | ||
* | * 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1028.eqiad.wmnet with OS bullseye | ||
* | * 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19211 and previous config saved to /var/cache/conftool/dbconfig/20220125-173156-root.json | ||
* | * 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19210 and previous config saved to /var/cache/conftool/dbconfig/20220125-171652-root.json | ||
* 17:02 cwhite: upgrade elasticsearch-curator on apifeatureusage1001 | |||
* 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19209 and previous config saved to /var/cache/conftool/dbconfig/20220125-170148-root.json | |||
* | * 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19208 and previous config saved to /var/cache/conftool/dbconfig/20220125-164900-ladsgroup.json | |||
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19207 and previous config saved to /var/cache/conftool/dbconfig/20220125-164645-root.json | |||
* 16:46 taavi: deploy updated patch for [[phab:T285116|T285116]] | |||
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1031 master of es3 [[phab:T299911|T299911]]', diff saved to https://phabricator.wikimedia.org/P19206 and previous config saved to /var/cache/conftool/dbconfig/20220125-164324-ladsgroup.json | |||
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19204 and previous config saved to /var/cache/conftool/dbconfig/20220125-164118-ladsgroup.json | |||
* 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19203 and previous config saved to /var/cache/conftool/dbconfig/20220125-163721-marostegui.json | |||
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19202 and previous config saved to /var/cache/conftool/dbconfig/20220125-163141-root.json | |||
* 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19201 and previous config saved to /var/cache/conftool/dbconfig/20220125-163054-root.json | |||
* 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19200 and previous config saved to /var/cache/conftool/dbconfig/20220125-162613-ladsgroup.json | |||
* 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19199 and previous config saved to /var/cache/conftool/dbconfig/20220125-162217-marostegui.json | |||
* 16:21 cmjohnson1: updating firmware ganeti1005 [[phab:T299527|T299527]] | |||
* 16:18 cmjohnson1: updating firmware ganeti1014 [[phab:T299527|T299527]] | |||
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19198 and previous config saved to /var/cache/conftool/dbconfig/20220125-161550-root.json | |||
* 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19197 and previous config saved to /var/cache/conftool/dbconfig/20220125-161108-ladsgroup.json | |||
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19196 and previous config saved to /var/cache/conftool/dbconfig/20220125-160712-marostegui.json | |||
* 16:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues | |||
* 16:06 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues | |||
* 16:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1022.eqiad.wmnet with OS bullseye | |||
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19195 and previous config saved to /var/cache/conftool/dbconfig/20220125-160522-marostegui.json | |||
* 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19194 and previous config saved to /var/cache/conftool/dbconfig/20220125-160047-root.json | |||
* 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19193 and previous config saved to /var/cache/conftool/dbconfig/20220125-155604-ladsgroup.json | |||
* 15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1034.eqiad.wmnet with OS bullseye | |||
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19192 and previous config saved to /var/cache/conftool/dbconfig/20220125-155207-marostegui.json | |||
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19191 and previous config saved to /var/cache/conftool/dbconfig/20220125-155101-marostegui.json | |||
* 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance | |||
* 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance | |||
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19190 and previous config saved to /var/cache/conftool/dbconfig/20220125-155053-marostegui.json | |||
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19189 and previous config saved to /var/cache/conftool/dbconfig/20220125-155017-marostegui.json | |||
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19187 and previous config saved to /var/cache/conftool/dbconfig/20220125-154543-root.json | |||
* 15:38 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6002.drmrs.wmnet | |||
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19186 and previous config saved to /var/cache/conftool/dbconfig/20220125-153548-marostegui.json | |||
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19185 and previous config saved to /var/cache/conftool/dbconfig/20220125-153511-marostegui.json | |||
* 15:34 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye | |||
* 15:32 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet | |||
* 15:31 godog: centrallog1001:~# lvextend --resizefs --size +23G /dev/centrallog1001-vg/data | |||
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19184 and previous config saved to /var/cache/conftool/dbconfig/20220125-153040-root.json | |||
* 15:24 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6002.drmrs.wmnet | |||
* 15:21 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=ncredir6002.* | |||
* 15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1034.eqiad.wmnet with OS bullseye | |||
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19183 and previous config saved to /var/cache/conftool/dbconfig/20220125-152044-marostegui.json | |||
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19182 and previous config saved to /var/cache/conftool/dbconfig/20220125-152006-marostegui.json | |||
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19181 and previous config saved to /var/cache/conftool/dbconfig/20220125-151900-marostegui.json | |||
* 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance | |||
* 15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance | |||
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19180 and previous config saved to /var/cache/conftool/dbconfig/20220125-151852-marostegui.json | |||
* 15:18 mmandere@cumin1001: conftool action : select; selector: cluster=necredir,dc=drmrs | |||
* 15:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* 15:17 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19179 and previous config saved to /var/cache/conftool/dbconfig/20220125-151536-root.json | |||
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet | |||
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet | |||
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19178 and previous config saved to /var/cache/conftool/dbconfig/20220125-150539-marostegui.json | |||
* 15:04 bblack: lvs6002: restarting pybal | |||
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging | |||
* 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19177 and previous config saved to /var/cache/conftool/dbconfig/20220125-150348-marostegui.json | |||
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external | |||
* 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal | |||
* 15:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging | |||
* 15:03 bblack: lvs600[13]: restarting pybal | |||
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19176 and previous config saved to /var/cache/conftool/dbconfig/20220125-150256-ladsgroup.json | |||
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19175 and previous config saved to /var/cache/conftool/dbconfig/20220125-150052-ladsgroup.json | |||
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19174 and previous config saved to /var/cache/conftool/dbconfig/20220125-150031-root.json | |||
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19173 and previous config saved to /var/cache/conftool/dbconfig/20220125-144843-marostegui.json | |||
* 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19172 and previous config saved to /var/cache/conftool/dbconfig/20220125-144548-ladsgroup.json | |||
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19171 and previous config saved to /var/cache/conftool/dbconfig/20220125-144528-root.json | |||
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19170 and previous config saved to /var/cache/conftool/dbconfig/20220125-143338-marostegui.json | |||
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19169 and previous config saved to /var/cache/conftool/dbconfig/20220125-143232-marostegui.json | |||
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance | |||
* 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance | |||
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19168 and previous config saved to /var/cache/conftool/dbconfig/20220125-143218-marostegui.json | |||
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19167 and previous config saved to /var/cache/conftool/dbconfig/20220125-143043-ladsgroup.json | |||
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19166 and previous config saved to /var/cache/conftool/dbconfig/20220125-143024-root.json | |||
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s8 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P19165 and previous config saved to /var/cache/conftool/dbconfig/20220125-142614-marostegui.json | |||
* 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner1001.eqiad.wmnet on all recursors | |||
* 14:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-runner1001.eqiad.wmnet on all recursors | |||
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19164 and previous config saved to /var/cache/conftool/dbconfig/20220125-141714-marostegui.json | |||
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19163 and previous config saved to /var/cache/conftool/dbconfig/20220125-141538-ladsgroup.json | |||
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19162 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-root.json | |||
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19161 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-marostegui.json | |||
* 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance | |||
* 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance | |||
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19160 and previous config saved to /var/cache/conftool/dbconfig/20220125-141513-marostegui.json | |||
* 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1026.eqiad.wmnet with OS bullseye | |||
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19159 and previous config saved to /var/cache/conftool/dbconfig/20220125-140209-marostegui.json | |||
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19158 and previous config saved to /var/cache/conftool/dbconfig/20220125-140008-marostegui.json | |||
* 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1031.eqiad.wmnet with OS bullseye | |||
* 13:55 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye | |||
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086 (s7,s8) [[phab:T299882|T299882]]', diff saved to https://phabricator.wikimedia.org/P19157 and previous config saved to /var/cache/conftool/dbconfig/20220125-135212-marostegui.json | |||
* 13:50 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye | |||
* 13:48 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye | |||
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19156 and previous config saved to /var/cache/conftool/dbconfig/20220125-134704-marostegui.json | |||
* 13:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet | |||
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19155 and previous config saved to /var/cache/conftool/dbconfig/20220125-134557-marostegui.json | |||
* 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance | |||
* 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance | |||
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19154 and previous config saved to /var/cache/conftool/dbconfig/20220125-134547-marostegui.json | |||
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19153 and previous config saved to /var/cache/conftool/dbconfig/20220125-134503-marostegui.json | |||
* 13:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1026.eqiad.wmnet with OS bullseye | |||
* 13:38 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet | |||
* 13:33 _joe_: restarted pybal on lvs6003 | |||
* 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | |||
* 13:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | |||
* 13:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=ncredir,name=ncredir6001.drmrs.wmnet | |||
* 13:30 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=drmrs,cluster=ncredir | |||
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19151 and previous config saved to /var/cache/conftool/dbconfig/20220125-133042-marostegui.json | |||
* 13:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row | |||
* 13:30 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row | |||
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19150 and previous config saved to /var/cache/conftool/dbconfig/20220125-132958-marostegui.json | |||
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19149 and previous config saved to /var/cache/conftool/dbconfig/20220125-132852-marostegui.json | |||
* 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance | |||
* 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance | |||
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19148 and previous config saved to /var/cache/conftool/dbconfig/20220125-132844-marostegui.json | |||
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:26 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye | |||
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1031.eqiad.wmnet with OS bullseye | |||
* 13:22 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging | |||
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external | |||
* 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal | |||
* 13:20 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging | |||
* 13:19 taavi@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:752296{{!}}wikitech: use ldap-rw.$SITE for ldap access (T295150)]] (duration: 00m 49s) | |||
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1026 [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19147 and previous config saved to /var/cache/conftool/dbconfig/20220125-131727-marostegui.json | |||
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1030 to es2 master [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19146 and previous config saved to /var/cache/conftool/dbconfig/20220125-131622-marostegui.json | |||
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19145 and previous config saved to /var/cache/conftool/dbconfig/20220125-131537-marostegui.json | |||
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19144 and previous config saved to /var/cache/conftool/dbconfig/20220125-131340-marostegui.json | |||
* 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19143 and previous config saved to /var/cache/conftool/dbconfig/20220125-130032-marostegui.json | |||
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19142 and previous config saved to /var/cache/conftool/dbconfig/20220125-125923-marostegui.json | |||
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance | |||
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance | |||
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance | |||
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance | |||
* 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance | |||
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance | |||
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19141 and previous config saved to /var/cache/conftool/dbconfig/20220125-125857-marostegui.json | |||
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19140 and previous config saved to /var/cache/conftool/dbconfig/20220125-125835-marostegui.json | |||
* 12:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production | |||
* 12:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on staging | |||
* 12:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync on production | |||
* 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production | |||
* 12:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on staging | |||
* 12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync on production | |||
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19139 and previous config saved to /var/cache/conftool/dbconfig/20220125-124352-marostegui.json | |||
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19138 and previous config saved to /var/cache/conftool/dbconfig/20220125-124330-marostegui.json | |||
* 12:38 Lucas_WMDE: UTC morning backport window done | |||
* 12:37 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/modules: Backport (2/2): [[gerrit:756941{{!}}Add an image: update onboarding images for desktop (T298109)]] (duration: 00m 49s) | |||
* 12:36 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/images: Backport (1/2): [[gerrit:756941{{!}}Add an image: update onboarding images for desktop (T298109)]] (duration: 00m 50s) | |||
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool es1031 ([[phab:T299911|T299911]])', diff saved to https://phabricator.wikimedia.org/P19136 and previous config saved to /var/cache/conftool/dbconfig/20220125-123303-ladsgroup.json | |||
* 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19135 and previous config saved to /var/cache/conftool/dbconfig/20220125-122848-marostegui.json | |||
* 12:17 hnowlan: removal of restbase2011 from cassandra cluster complete | |||
* 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19134 and previous config saved to /var/cache/conftool/dbconfig/20220125-121343-marostegui.json | |||
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755330{{!}}Enable statement usage tracking for Armenian Wikipedia (hywiki) (T296382)]] (duration: 00m 50s) | |||
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19133 and previous config saved to /var/cache/conftool/dbconfig/20220125-120632-marostegui.json | |||
* 12:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance | |||
* 12:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance | |||
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19132 and previous config saved to /var/cache/conftool/dbconfig/20220125-120625-marostegui.json | |||
* 11:57 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=appserver,service=canary | |||
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19131 and previous config saved to /var/cache/conftool/dbconfig/20220125-115120-marostegui.json | |||
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19130 and previous config saved to /var/cache/conftool/dbconfig/20220125-114311-marostegui.json | |||
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance | |||
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance | |||
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19129 and previous config saved to /var/cache/conftool/dbconfig/20220125-114258-marostegui.json | |||
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19128 and previous config saved to /var/cache/conftool/dbconfig/20220125-113616-marostegui.json | |||
* 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2021.codfw.wmnet with OS bullseye | |||
* 11:29 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1011.eqiad.wmnet | |||
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19127 and previous config saved to /var/cache/conftool/dbconfig/20220125-112753-marostegui.json | |||
* 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2027.codfw.wmnet with OS bullseye | |||
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19126 and previous config saved to /var/cache/conftool/dbconfig/20220125-112111-marostegui.json | |||
* 11:19 moritzm: installing apache security updates | |||
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19125 and previous config saved to /var/cache/conftool/dbconfig/20220125-111249-marostegui.json | |||
* 11:07 godog: temp disable alerting on prometheus200[56] - [[phab:T296199|T296199]] | |||
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19124 and previous config saved to /var/cache/conftool/dbconfig/20220125-105744-marostegui.json | |||
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19123 and previous config saved to /var/cache/conftool/dbconfig/20220125-105636-marostegui.json | |||
* 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance | |||
* 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance | |||
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19122 and previous config saved to /var/cache/conftool/dbconfig/20220125-105628-marostegui.json | |||
* 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2021.codfw.wmnet with OS bullseye | |||
* 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2027.codfw.wmnet with OS bullseye | |||
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 10:50 hnowlan: disabling puppet on all maps hosts to test cassandra removal | |||
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2011.eqiad.wmnet | |||
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2020', diff saved to https://phabricator.wikimedia.org/P19121 and previous config saved to /var/cache/conftool/dbconfig/20220125-104331-marostegui.json | |||
* 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2020.codfw.wmnet with OS bullseye | |||
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19120 and previous config saved to /var/cache/conftool/dbconfig/20220125-104124-marostegui.json | |||
* 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2029.codfw.wmnet with OS bullseye | |||
* 10:36 hnowlan: nodetool removenode for restbase2011-c | |||
* 10:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 [[phab:T299123|T299123]]', diff saved to https://phabricator.wikimedia.org/P19119 and previous config saved to /var/cache/conftool/dbconfig/20220125-102912-marostegui.json | |||
* 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 10:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19118 and previous config saved to /var/cache/conftool/dbconfig/20220125-102619-marostegui.json | |||
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19117 and previous config saved to /var/cache/conftool/dbconfig/20220125-102448-marostegui.json | |||
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance | |||
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance | |||
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19116 and previous config saved to /var/cache/conftool/dbconfig/20220125-102426-marostegui.json | |||
* 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet | |||
* 10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet | |||
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19115 and previous config saved to /var/cache/conftool/dbconfig/20220125-101114-marostegui.json | |||
* 10:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19114 and previous config saved to /var/cache/conftool/dbconfig/20220125-100921-marostegui.json | |||
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19113 and previous config saved to /var/cache/conftool/dbconfig/20220125-100907-marostegui.json | |||
* 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance | |||
* 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance | |||
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19112 and previous config saved to /var/cache/conftool/dbconfig/20220125-100900-marostegui.json | |||
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 10:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 10:04 taavi@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:755534{{!}}Undeploy UserMerge (3) (T216089)]] (duration: 00m 48s) | |||
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2020.codfw.wmnet with OS bullseye | |||
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2029.codfw.wmnet with OS bullseye | |||
* 10:01 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755533{{!}}Undeploy UserMerge (2) (T216089)]] (duration: 00m 49s) | |||
* 10:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020', diff saved to https://phabricator.wikimedia.org/P19111 and previous config saved to /var/cache/conftool/dbconfig/20220125-100036-marostegui.json | |||
* 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - [[phab:T299911|T299911]] | |||
* 09:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 09:59 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:755532{{!}}Undeploy UserMerge (1) (T216089)]] (duration: 00m 49s) | |||
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19110 and previous config saved to /var/cache/conftool/dbconfig/20220125-095417-marostegui.json | |||
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19109 and previous config saved to /var/cache/conftool/dbconfig/20220125-095355-marostegui.json | |||
* 09:40 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 09:40 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 09:40 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6001.drmrs.wmnet | |||
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19108 and previous config saved to /var/cache/conftool/dbconfig/20220125-093912-marostegui.json | |||
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19107 and previous config saved to /var/cache/conftool/dbconfig/20220125-093850-marostegui.json | |||
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T299827|T299827]])', diff saved to https://phabricator.wikimedia.org/P19106 and previous config saved to /var/cache/conftool/dbconfig/20220125-093806-marostegui.json | |||
* 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance | |||
* 09:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance | |||
* 09:23 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6001.drmrs.wmnet | |||
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19105 and previous config saved to /var/cache/conftool/dbconfig/20220125-092346-marostegui.json | |||
* 09:23 dcausse: restarting blazegraph on wdqs1004 (jvm stuck for 1h) | |||
* 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1013.eqiad.wmnet with OS buster | |||
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19104 and previous config saved to /var/cache/conftool/dbconfig/20220125-085228-root.json | |||
* 08:45 moritzm: draining instances off ganeti1005 for reimage | |||
* 08:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1013.eqiad.wmnet with OS buster | |||
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19103 and previous config saved to /var/cache/conftool/dbconfig/20220125-083724-root.json | |||
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 08:32 jayme: kubernetes staging migrated tainted worker node setup - [[phab:T290967|T290967]] | |||
* 08:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet | |||
* 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 08:25 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1013 to master in pc3 [[phab:T299046|T299046]] (duration: 00m 49s) | |||
* 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet | |||
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19102 and previous config saved to /var/cache/conftool/dbconfig/20220125-082326-marostegui.json | |||
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance | |||
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance | |||
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19101 and previous config saved to /var/cache/conftool/dbconfig/20220125-082319-marostegui.json | |||
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19100 and previous config saved to /var/cache/conftool/dbconfig/20220125-082220-root.json | |||
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19099 and previous config saved to /var/cache/conftool/dbconfig/20220125-080814-marostegui.json | |||
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19098 and previous config saved to /var/cache/conftool/dbconfig/20220125-080717-root.json | |||
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19097 and previous config saved to /var/cache/conftool/dbconfig/20220125-075309-marostegui.json | |||
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19096 and previous config saved to /var/cache/conftool/dbconfig/20220125-075213-root.json | |||
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19095 and previous config saved to /var/cache/conftool/dbconfig/20220125-073805-marostegui.json | |||
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19094 and previous config saved to /var/cache/conftool/dbconfig/20220125-073709-root.json | |||
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19093 and previous config saved to /var/cache/conftool/dbconfig/20220125-073457-marostegui.json | |||
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance | |||
* 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance | |||
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19092 and previous config saved to /var/cache/conftool/dbconfig/20220125-073450-marostegui.json | |||
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19091 and previous config saved to /var/cache/conftool/dbconfig/20220125-072206-root.json | |||
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19090 and previous config saved to /var/cache/conftool/dbconfig/20220125-071945-marostegui.json | |||
* 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bullseye | |||
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19089 and previous config saved to /var/cache/conftool/dbconfig/20220125-070702-root.json | |||
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19088 and previous config saved to /var/cache/conftool/dbconfig/20220125-070441-marostegui.json | |||
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19087 and previous config saved to /var/cache/conftool/dbconfig/20220125-065158-root.json | |||
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19086 and previous config saved to /var/cache/conftool/dbconfig/20220125-064936-marostegui.json | |||
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bullseye | |||
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19085 and previous config saved to /var/cache/conftool/dbconfig/20220125-064829-marostegui.json | |||
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | |||
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19084 and previous config saved to /var/cache/conftool/dbconfig/20220125-064801-marostegui.json | |||
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19083 and previous config saved to /var/cache/conftool/dbconfig/20220125-063655-root.json | |||
* 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1030.eqiad.wmnet with OS bullseye | |||
* 06:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 06:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19082 and previous config saved to /var/cache/conftool/dbconfig/20220125-063256-marostegui.json | |||
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 06:26 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc3 [[phab:T299046|T299046]] (duration: 00m 49s) | |||
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19081 and previous config saved to /var/cache/conftool/dbconfig/20220125-061751-marostegui.json | |||
* 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1030.eqiad.wmnet with OS bullseye | |||
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19080 and previous config saved to /var/cache/conftool/dbconfig/20220125-060247-marostegui.json | |||
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1030 [[phab:T299889|T299889]]', diff saved to https://phabricator.wikimedia.org/P19079 and previous config saved to /var/cache/conftool/dbconfig/20220125-060241-marostegui.json | |||
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P19078 and previous config saved to /var/cache/conftool/dbconfig/20220125-060128-marostegui.json | |||
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance | |||
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 02:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:29 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755834{{!}}Lower The Wikipedia Library editcount]] (duration: 00m 49s) | |||
* 00:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:23 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756585{{!}}Enable wgMinervaEnableSiteNotice for bnwiki (T299529)]] (duration: 00m 49s) | |||
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 00:14 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:756712{{!}}bgwiki: fix setup for Draft namespace (T299224)]] (duration: 00m 49s) | |||
* 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
== 2022-01-24 == | |||
* 23:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 23:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 23:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 23:29 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:756720{{!}}Revert "Choose wikiversions.php file relative to MWMultiVersion.php"]] (duration: 00m 49s) | |||
* 22:54 ryankemper: [[phab:T280001|T280001]] Removed downtime on `wcqs*` | |||
* 22:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS buster | |||
* 22:48 ryankemper: [[phab:T280001|T280001]] Moved `wcqs` service state into `production` by merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/756713; running puppet on authdns/alert hosts | |||
* 22:32 inflatador: [[phab:T280001|T280001]] [[phab:T282117|T282117]] Merged https://gerrit.wikimedia.org/r/c | |||
== | == 2022-01-23 == | ||
* 22: | * 22:02 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) (duration: 00m 08s) | ||
* 22:02 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) | |||
* 21:27 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided) (duration: 00m 09s) | |||
* 21:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided) | |||
* | |||
* | |||
* | |||
== | == 2022-01-22 == | ||
* | * 22:38 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing | ||
* 22:38 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* 22: | * 14:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing | ||
* 14:51 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* | * 08:35 elukey: `apt-get clean` on an-test-coord1001 to free some space | ||
* 08:25 elukey: remove the `--debug=true` etcd daemon arg from ml-etcd2002 (only node having it, probably a manual test done in the past) and cleaned up spammy etcd logs to free space | |||
* | * 01:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing | ||
* 01:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* | * 00:27 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb | ||
* | |||
* | |||
* | |||
* 00: | |||
== | == 2022-01-21 == | ||
* | * 22:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing | ||
* | * 22:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing | ||
* | * 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 21:38 brennen@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/VisualEditor/modules/ve-mw: Backport: [[gerrit:756066{{!}}Revert "Re-duplicate deduplicated TemplateStyles" (T287675 T299251 T299767)]] (duration: 00m 49s) | ||
* | * 21:21 topranks: Running homer against cr1-eqiad and cr2-eqiad to remove entries on analytics-in4/6 filters that refer to decommissioned deb mirror host sodium. | ||
* 19:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* | * 19:10 ayounsi@cumin1001: START - Cookbook sre.dns.netbox | ||
* 14 | * 19:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 19:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 18:46 herron: restarting pybal on lvs1015,lvs1020,lvs2009,lvs2010 to remove legacy elk5 services [[phab:T299700|T299700]] | ||
* | * 18:39 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 18:36 robh@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 18:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* | * 18:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 17:42 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.4-1_amd64.changes | ||
* | * 16:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet | ||
* 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1021.eqiad.wmnet with OS buster | |||
* | * 16:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS buster | ||
* | * 16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet | ||
* | * 16:46 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1020.eqiad.wmnet with OS buster | ||
* | * 16:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sodium.wikimedia.org | ||
* | * 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS buster | ||
* | * 16:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | ||
* | * 16:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | ||
* | * 16:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts sodium.wikimedia.org | ||
* | * 16:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet | ||
* | * 16:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1019.eqiad.wmnet with OS buster | ||
* | * 16:02 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom | ||
* 16:02 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom | |||
* | * 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | ||
* | * 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | ||
* | * 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | ||
* | * 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage | ||
* | * 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet | ||
* 15:50 moritzm: added ganeti1025 to Ganeti eqiad cluster [[phab:T293909|T293909]] | |||
* 15:29 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* 15:29 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing | |||
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster | |||
* 15:24 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS buster | |||
* 15:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1018.eqiad.wmnet | |||
* 15:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1018.eqiad.wmnet with OS buster | |||
* 15:07 herron: removing kibana.discovery.wmnet record and switching legacy elk LVS instances to state: lvs_setup [[phab:T299700|T299700]] | |||
* 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' . | |||
* 14:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 14:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster | |||
* 14:35 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 07s) | |||
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1018.eqiad.wmnet with OS buster | |||
* 14:35 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 13:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster | |||
* 13:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1017.eqiad.wmnet with OS buster | |||
* 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1017.eqiad.wmnet | |||
* 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2025.codfw.wmnet | |||
* 13:01 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | |||
* 13:01 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1016.eqiad.wmnet | |||
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2024.codfw.wmnet | |||
* 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1017.eqiad.wmnet with OS buster | |||
* 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster | |||
* 12:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster | |||
* 12:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1016.eqiad.wmnet with OS buster | |||
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet | |||
* 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet | |||
* 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet | |||
* 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1016.eqiad.wmnet with OS buster | |||
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster | |||
* 11:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2023.codfw.wmnet | |||
* 11:15 vgutierrez: pool cp3063 running envoy as TLS termination layer - [[phab:T271421|T271421]] | |||
* 11:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2023.codfw.wmnet with OS buster | |||
* 10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS buster | |||
* 10:33 moritzm: migrate primary/secondary instances off ganeti1013 | |||
* 10:14 moritzm: switch kubetcd1006 back to plain disks | |||
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks | |||
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks | |||
* 10:09 moritzm: switch kubetcd1005 back to plain disks | |||
* 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2023.codfw.wmnet with OS buster | |||
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks | |||
* 10:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks | |||
* 09:51 moritzm: switch kubetcd1004 back to plain disks | |||
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks | |||
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks | |||
* 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster | |||
* 09:40 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3063.esams.wmnet with OS buster | |||
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18970 and previous config saved to /var/cache/conftool/dbconfig/20220121-093120-root.json | |||
* 09:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18969 and previous config saved to /var/cache/conftool/dbconfig/20220121-091617-root.json | |||
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox | |||
* 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) | |||
* 09:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox | |||
* 09:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster | |||
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18968 and previous config saved to /var/cache/conftool/dbconfig/20220121-090113-root.json | |||
* 09:00 vgutierrez: depool cp3063 to be reimaged as cache::upload_envoy - [[phab:T271421|T271421]] | |||
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18967 and previous config saved to /var/cache/conftool/dbconfig/20220121-084609-root.json | |||
* 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet | |||
* 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet | |||
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet | |||
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18966 and previous config saved to /var/cache/conftool/dbconfig/20220121-083106-root.json | |||
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet | |||
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18965 and previous config saved to /var/cache/conftool/dbconfig/20220121-081602-root.json | |||
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18964 and previous config saved to /var/cache/conftool/dbconfig/20220121-080058-root.json | |||
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18963 and previous config saved to /var/cache/conftool/dbconfig/20220121-075801-root.json | |||
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18962 and previous config saved to /var/cache/conftool/dbconfig/20220121-074555-root.json | |||
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18961 and previous config saved to /var/cache/conftool/dbconfig/20220121-074257-root.json | |||
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18960 and previous config saved to /var/cache/conftool/dbconfig/20220121-073051-root.json | |||
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1032.eqiad.wmnet with OS bullseye | |||
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18959 and previous config saved to /var/cache/conftool/dbconfig/20220121-072754-root.json | |||
* 07:26 elukey: elukey@stat1007:~$ sudo systemctl reset-failed product-analytics-movement-metrics.service | |||
* 07:21 elukey: elukey@build2001:~$ sudo systemctl reset-failed ifup@ens13.service | |||
* 07:19 elukey: systemctl reset-failed session-3.scope on an-test-client1001 (failed, transient unit) | |||
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18958 and previous config saved to /var/cache/conftool/dbconfig/20220121-071250-root.json | |||
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1032.eqiad.wmnet with OS bullseye | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for reimage [[phab:T299741|T299741]]', diff saved to https://phabricator.wikimedia.org/P18957 and previous config saved to /var/cache/conftool/dbconfig/20220121-065854-marostegui.json | |||
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18956 and previous config saved to /var/cache/conftool/dbconfig/20220121-065746-root.json | |||
* 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS bullseye | |||
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18955 and previous config saved to /var/cache/conftool/dbconfig/20220121-064243-root.json | |||
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18954 and previous config saved to /var/cache/conftool/dbconfig/20220121-062739-root.json | |||
* 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS bullseye | |||
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master [[phab:T299741|T299741]]', diff saved to https://phabricator.wikimedia.org/P18953 and previous config saved to /var/cache/conftool/dbconfig/20220121-062116-marostegui.json | |||
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2030.codfw.wmnet with OS bullseye | |||
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18952 and previous config saved to /var/cache/conftool/dbconfig/20220121-061235-root.json | |||
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18951 and previous config saved to /var/cache/conftool/dbconfig/20220121-055732-root.json | |||
* 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2030.codfw.wmnet with OS bullseye | |||
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18950 and previous config saved to /var/cache/conftool/dbconfig/20220121-054228-root.json | |||
== | == 2022-01-20 == | ||
* 22: | * 22:40 inflatador: running puppet-merge for https://gerrit.wikimedia.org/r/755810 | ||
* | * 22:27 urandom: rolling restart of Cassandra, aqs-next -- [[phab:T298516|T298516]] | ||
* | * 21:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster | ||
* | * 20:58 jhathaway: rebotting mx1001 to test new kernel | ||
* | * 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:37 urandom: upgrading Cassandra to 3.11.11, aqs1010 -- [[phab:T298516|T298516]] | ||
* | * 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]] | ||
* | * 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster | ||
* | * 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:31 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/DiscussionTools/includes/HeadingItem.php: Backport: [[gerrit:755684{{!}}Prevent assertion failure caused by empty headings (T299583)]] (duration: 00m 50s) | ||
* | * 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:38 bd808@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Remove password clear on block (duration: 00m 50s) | ||
* | * 19:19 jhathaway: rebooting mx1001 to test new kernel | ||
* | * 19:17 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main | ||
* | * 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:14 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main | ||
* | * 19:13 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main | ||
* | * 19:11 cjming: end of UTC evening backport & config window | ||
* | * 19:10 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main | ||
* | * 19:10 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main | ||
* | * 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:08 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main | ||
* | * 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 19:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755745{{!}}Disable language alert for pilot wikis except thwiki, viwiki. (T295555)]] (duration: 00m 51s) | |||
* | * 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 18:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 18:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/includes/Hooks.php: Backport: [[gerrit:755682{{!}}Do not try to make watchlist collapsible on wikis where watchlist is disabled (T299671)]] (duration: 00m 50s) | ||
* 18:27 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755741 enhancements for the settings benchmark entrypoint (duration: 00m 51s) | |||
* | * 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet | ||
* | * 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS buster | ||
* 18:17 mutante: running puppet on cp403* | |||
* | * 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS buster | ||
* | * 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet | ||
* | * 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS buster | ||
* | * 17:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster | ||
* | * 17:18 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755678{{!}}Revert "Make Block objects aware of which wiki they belong to"]] (duration: 00m 55s) | ||
* | * 17:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster | ||
* | * 17:15 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1008.eqiad.wmnet with OS buster | ||
* | * 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* | * 17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 17:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster | ||
* | * 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS buster | ||
* | * 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 17:04 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference | |||
* | * 17:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2020.codfw.wmnet with OS buster | ||
* | * 17:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox | ||
* | * 16:55 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS buster | ||
* | * 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster | ||
* | * 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 16:50 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755399 add temporary entrypoint for settings benchmark (duration: 00m 50s) | |||
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster | |||
* 16:48 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster | |||
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster | |||
* 16:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet | |||
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS buster | |||
* 15:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS buster | |||
* 15:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | |||
* 15:46 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 15:43 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 15:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 15:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 15:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 15:20 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main | |||
* 15:16 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main | |||
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet | |||
* 15:14 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main | |||
* 15:13 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 15:12 moritzm: enabled hardware virtualisation in BIOS for ganeti1028 [[phab:T293909|T293909]] | |||
* 15:11 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main | |||
* 15:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet | |||
* 15:05 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 15:05 moritzm: enabled hardware virtualisation in BIOS for ganeti1027 [[phab:T293909|T293909]] | |||
* 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet | |||
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 14:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 14:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet | |||
* 14:56 moritzm: enabled hardware virtualisation in BIOS for ganeti1026 [[phab:T293909|T293909]] | |||
* 14:55 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 11s) | |||
* 14:55 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet | |||
* 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet | |||
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 14:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster | |||
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster | |||
* 14:20 moritzm: enabled hardware virtualisation in BIOS for ganeti1023 [[phab:T283036|T283036]] | |||
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet | |||
* 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet | |||
* 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet | |||
* 14:03 moritzm: enabled hardware virtualisation in BIOS for ganeti1024 [[phab:T283036|T283036]] | |||
* 13:55 marostegui: Power off es1022 for onsite maintenance [[phab:T299123|T299123]] | |||
* 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet | |||
* 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS | |||
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS | |||
* 13:51 moritzm: enabled hardware virtualisation in BIOS for ganeti1025 [[phab:T293909|T293909]] | |||
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet | |||
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet | |||
* 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS | |||
* 13:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS | |||
* 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/CentralNotice/includes/: Backport: [[gerrit:755670{{!}}Replace remaining usages of IDatabase::fetchObject()/::numRows() (T286694)]] (duration: 00m 50s) | |||
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:03 Lucas_WMDE: UTC morning backport window done | |||
* 13:02 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/deferred/LinksUpdate/LinksUpdate.php: Backport: [[gerrit:755668{{!}}Fix deprecation warning from LinksUpdate::getImages() (T299472)]] (duration: 00m 50s) | |||
* 13:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/maintenance/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (2/2) (duration: 00m 50s) | |||
* 13:00 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: [[gerrit:755667{{!}}Replace remaining usages of IDatabase::fetchObject() (T299471)]] (1/2) (duration: 00m 56s) | |||
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755322{{!}}Enable usage tracking for statements in Waray Wikipedia (T296383)]] (expecting some gradual increase of wbc_entity_usage rows on warwiki) (duration: 00m 51s) | |||
* 12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | |||
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18943 and previous config saved to /var/cache/conftool/dbconfig/20220120-121520-marostegui.json | |||
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production | |||
* 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging | |||
* 12:10 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production | |||
* 12:09 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production | |||
* 12:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging | |||
* 12:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production | |||
* 12:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging | |||
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production | |||
* 12:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging | |||
* 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging | |||
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production | |||
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging | |||
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging | |||
* 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production | |||
* 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging | |||
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging | |||
* 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production | |||
* 12:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging | |||
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18942 and previous config saved to /var/cache/conftool/dbconfig/20220120-120015-marostegui.json | |||
* 11:49 moritzm: add ganeti1024 to Ganeti eqiad cluster [[phab:T283036|T283036]] | |||
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18941 and previous config saved to /var/cache/conftool/dbconfig/20220120-114510-marostegui.json | |||
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet | |||
* 11:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet | |||
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18940 and previous config saved to /var/cache/conftool/dbconfig/20220120-113006-marostegui.json | |||
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'. | |||
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18939 and previous config saved to /var/cache/conftool/dbconfig/20220120-112854-marostegui.json | |||
* 11:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance | |||
* 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance | |||
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18938 and previous config saved to /var/cache/conftool/dbconfig/20220120-112846-marostegui.json | |||
* 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 11:24 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production | |||
* 11:23 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging | |||
* 11:23 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production | |||
* 11:22 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | |||
* 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 11:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s) | |||
* 11:21 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 11:19 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production | |||
* 11:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | |||
* 11:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 11:18 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging | |||
* 11:18 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production | |||
* 11:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging | |||
* 11:13 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production | |||
* 11:13 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging | |||
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18937 and previous config saved to /var/cache/conftool/dbconfig/20220120-111341-marostegui.json | |||
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18936 and previous config saved to /var/cache/conftool/dbconfig/20220120-105837-marostegui.json | |||
* 10:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s) | |||
* 10:52 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1018.eqiad.wmnet with OS buster | |||
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18935 and previous config saved to /var/cache/conftool/dbconfig/20220120-104332-marostegui.json | |||
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18934 and previous config saved to /var/cache/conftool/dbconfig/20220120-104220-marostegui.json | |||
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance | |||
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance | |||
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance | |||
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18933 and previous config saved to /var/cache/conftool/dbconfig/20220120-104206-marostegui.json | |||
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18932 and previous config saved to /var/cache/conftool/dbconfig/20220120-102702-marostegui.json | |||
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18931 and previous config saved to /var/cache/conftool/dbconfig/20220120-101157-marostegui.json | |||
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18930 and previous config saved to /var/cache/conftool/dbconfig/20220120-095652-marostegui.json | |||
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster | |||
* 09:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1018.eqiad.wmnet with OS buster | |||
* 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster | |||
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18929 and previous config saved to /var/cache/conftool/dbconfig/20220120-092232-marostegui.json | |||
* 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance | |||
* 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance | |||
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18928 and previous config saved to /var/cache/conftool/dbconfig/20220120-092225-marostegui.json | |||
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18927 and previous config saved to /var/cache/conftool/dbconfig/20220120-091127-root.json | |||
* 09:09 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 09:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18926 and previous config saved to /var/cache/conftool/dbconfig/20220120-090720-marostegui.json | |||
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 08:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet | |||
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18925 and previous config saved to /var/cache/conftool/dbconfig/20220120-085623-root.json | |||
* 08:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet | |||
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18924 and previous config saved to /var/cache/conftool/dbconfig/20220120-085215-marostegui.json | |||
* 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet | |||
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet | |||
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18923 and previous config saved to /var/cache/conftool/dbconfig/20220120-084120-root.json | |||
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18922 and previous config saved to /var/cache/conftool/dbconfig/20220120-083711-marostegui.json | |||
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18921 and previous config saved to /var/cache/conftool/dbconfig/20220120-083558-marostegui.json | |||
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance | |||
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18920 and previous config saved to /var/cache/conftool/dbconfig/20220120-083520-marostegui.json | |||
* 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet | |||
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18919 and previous config saved to /var/cache/conftool/dbconfig/20220120-082616-root.json | |||
* 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet | |||
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18918 and previous config saved to /var/cache/conftool/dbconfig/20220120-082015-marostegui.json | |||
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for on-site maintenance [[phab:T299123|T299123]]', diff saved to https://phabricator.wikimedia.org/P18917 and previous config saved to /var/cache/conftool/dbconfig/20220120-081809-marostegui.json | |||
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18916 and previous config saved to /var/cache/conftool/dbconfig/20220120-081112-root.json | |||
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18915 and previous config saved to /var/cache/conftool/dbconfig/20220120-080510-marostegui.json | |||
* 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye | |||
* 07:57 marostegui: Stop mysql on db1117 to clone db1128 [[phab:T299344|T299344]] | |||
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18913 and previous config saved to /var/cache/conftool/dbconfig/20220120-075609-root.json | |||
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18912 and previous config saved to /var/cache/conftool/dbconfig/20220120-075005-marostegui.json | |||
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18911 and previous config saved to /var/cache/conftool/dbconfig/20220120-074753-marostegui.json | |||
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance | |||
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance | |||
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18910 and previous config saved to /var/cache/conftool/dbconfig/20220120-074746-marostegui.json | |||
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18909 and previous config saved to /var/cache/conftool/dbconfig/20220120-074105-root.json | |||
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18908 and previous config saved to /var/cache/conftool/dbconfig/20220120-073241-marostegui.json | |||
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye | |||
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18907 and previous config saved to /var/cache/conftool/dbconfig/20220120-072558-root.json | |||
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18906 and previous config saved to /var/cache/conftool/dbconfig/20220120-071736-marostegui.json | |||
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18905 and previous config saved to /var/cache/conftool/dbconfig/20220120-071054-root.json | |||
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18904 and previous config saved to /var/cache/conftool/dbconfig/20220120-070231-marostegui.json | |||
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18903 and previous config saved to /var/cache/conftool/dbconfig/20220120-070119-marostegui.json | |||
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance | |||
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance | |||
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance | |||
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance | |||
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18902 and previous config saved to /var/cache/conftool/dbconfig/20220120-070052-marostegui.json | |||
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18901 and previous config saved to /var/cache/conftool/dbconfig/20220120-065551-root.json | |||
* 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1180.eqiad.wmnet with OS bullseye | |||
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18900 and previous config saved to /var/cache/conftool/dbconfig/20220120-064547-marostegui.json | |||
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18899 and previous config saved to /var/cache/conftool/dbconfig/20220120-063042-marostegui.json | |||
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1180.eqiad.wmnet with OS bullseye | |||
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18898 and previous config saved to /var/cache/conftool/dbconfig/20220120-061538-marostegui.json | |||
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 [[phab:T299479|T299479]]', diff saved to https://phabricator.wikimedia.org/P18897 and previous config saved to /var/cache/conftool/dbconfig/20220120-061529-marostegui.json | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T285149|T285149]])', diff saved to https://phabricator.wikimedia.org/P18896 and previous config saved to /var/cache/conftool/dbconfig/20220120-061407-marostegui.json | |||
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance | |||
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance | |||
== | == 2022-01-19 == | ||
* 23: | * 23:36 mutante: deploy1002 - checked freshly generated cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml with 'openssl x509 -noout -text -in .. {{!}} grep DNS'. now has static-bz on it. ([[phab:T281538|T281538]]) | ||
* 23: | * 23:35 mutante: puppetmaster1001 - revoked puppet cert miscweb.discovery.wmnet; updated kube_services.crts.yaml to include static-bugzilla.wikimedia.org, removed miscweb.discovery.wmnet.crt and .csr.pem, used cergen to check and regenerate cert, committed in private repo, ran puppet on deploy1001 - checked cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml with 'openssl x509 | ||
* | * 21:43 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 26s) | ||
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) | |||
* | * 20:52 Krinkle: depool mw1340 (api_appserver) for performance and php-apcu testing | ||
* | * 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* 20:09 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]] (duration: 00m 49s) | |||
* | * 20:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.18 refs [[phab:T293959|T293959]] | ||
* | * 20:04 jhathaway: rebooting mx1001 to debug conntrack | ||
* | * 19:52 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/tests/phpunit/structure/SettingsTest.php: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 00m 49s) | ||
* 19:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: {{Gerrit|ed5e634772d2821c6f61903f7341eef4f2fc4337}}: First pass on creating config-schema.yaml (duration: 01m 02s) | |||
* 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1009.eqiad.wmnet | |||
* | * 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1008.eqiad.wmnet | ||
* | * 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1007.eqiad.wmnet | ||
* | * 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2006.codfw.wmnet | ||
* | * 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2005.codfw.wmnet | ||
* | * 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2004.codfw.wmnet | ||
* | * 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn | |||
* | * 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn | ||
* | * 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn | ||
* | * 19:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet | ||
* 19: | * |