You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply)
(56 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-03-26 ==
== 2022-05-22 ==
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23146 and previous config saved to /var/cache/conftool/dbconfig/20220326-011209-ladsgroup.json
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23145 and previous config saved to /var/cache/conftool/dbconfig/20220326-005704-ladsgroup.json
* 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23144 and previous config saved to /var/cache/conftool/dbconfig/20220326-004159-ladsgroup.json
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23143 and previous config saved to /var/cache/conftool/dbconfig/20220326-002653-ladsgroup.json
 
== 2022-03-25 ==
* 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23142 and previous config saved to /var/cache/conftool/dbconfig/20220325-235855-ladsgroup.json
* 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23141 and previous config saved to /var/cache/conftool/dbconfig/20220325-230540-ladsgroup.json
* 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23140 and previous config saved to /var/cache/conftool/dbconfig/20220325-225035-ladsgroup.json
* 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23139 and previous config saved to /var/cache/conftool/dbconfig/20220325-223530-ladsgroup.json
* 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23138 and previous config saved to /var/cache/conftool/dbconfig/20220325-222025-ladsgroup.json
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23137 and previous config saved to /var/cache/conftool/dbconfig/20220325-215400-ladsgroup.json
* 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23136 and previous config saved to /var/cache/conftool/dbconfig/20220325-215346-ladsgroup.json
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23135 and previous config saved to /var/cache/conftool/dbconfig/20220325-213841-ladsgroup.json
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23134 and previous config saved to /var/cache/conftool/dbconfig/20220325-212336-ladsgroup.json
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23133 and previous config saved to /var/cache/conftool/dbconfig/20220325-210831-ladsgroup.json
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23132 and previous config saved to /var/cache/conftool/dbconfig/20220325-210136-ladsgroup.json
* 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23131 and previous config saved to /var/cache/conftool/dbconfig/20220325-210128-ladsgroup.json
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23130 and previous config saved to /var/cache/conftool/dbconfig/20220325-204623-ladsgroup.json
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23129 and previous config saved to /var/cache/conftool/dbconfig/20220325-203118-ladsgroup.json
* 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23128 and previous config saved to /var/cache/conftool/dbconfig/20220325-201613-ladsgroup.json
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23127 and previous config saved to /var/cache/conftool/dbconfig/20220325-195137-ladsgroup.json
* 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23126 and previous config saved to /var/cache/conftool/dbconfig/20220325-192923-ladsgroup.json
* 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23125 and previous config saved to /var/cache/conftool/dbconfig/20220325-191416-ladsgroup.json
* 19:10 mutante: copying dump from deploy server to dumps server: scp -3 deploy1002.eqiad.wmnet:/srv/miscweb/static-bugzilla.tar.gz labstore1006.wikimedia.org:~ ([[phab:T284193|T284193]])
* 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23124 and previous config saved to /var/cache/conftool/dbconfig/20220325-185911-ladsgroup.json
* 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23123 and previous config saved to /var/cache/conftool/dbconfig/20220325-184406-ladsgroup.json
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23122 and previous config saved to /var/cache/conftool/dbconfig/20220325-181439-ladsgroup.json
* 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23121 and previous config saved to /var/cache/conftool/dbconfig/20220325-181431-ladsgroup.json
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23120 and previous config saved to /var/cache/conftool/dbconfig/20220325-175926-ladsgroup.json
* 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23119 and previous config saved to /var/cache/conftool/dbconfig/20220325-174421-ladsgroup.json
* 17:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
* 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23118 and previous config saved to /var/cache/conftool/dbconfig/20220325-172916-ladsgroup.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23117 and previous config saved to /var/cache/conftool/dbconfig/20220325-170154-ladsgroup.json
* 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23116 and previous config saved to /var/cache/conftool/dbconfig/20220325-170146-ladsgroup.json
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23115 and previous config saved to /var/cache/conftool/dbconfig/20220325-164641-ladsgroup.json
* 16:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23114 and previous config saved to /var/cache/conftool/dbconfig/20220325-163136-ladsgroup.json
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23112 and previous config saved to /var/cache/conftool/dbconfig/20220325-161631-ladsgroup.json
* 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23111 and previous config saved to /var/cache/conftool/dbconfig/20220325-154705-ladsgroup.json
* 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23110 and previous config saved to /var/cache/conftool/dbconfig/20220325-154658-ladsgroup.json
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23109 and previous config saved to /var/cache/conftool/dbconfig/20220325-153152-ladsgroup.json
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23108 and previous config saved to /var/cache/conftool/dbconfig/20220325-151647-ladsgroup.json
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23107 and previous config saved to /var/cache/conftool/dbconfig/20220325-150141-ladsgroup.json
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23101 and previous config saved to /var/cache/conftool/dbconfig/20220325-143545-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23100 and previous config saved to /var/cache/conftool/dbconfig/20220325-141301-ladsgroup.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23099 and previous config saved to /var/cache/conftool/dbconfig/20220325-140850-root.json
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23098 and previous config saved to /var/cache/conftool/dbconfig/20220325-135756-ladsgroup.json
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23097 and previous config saved to /var/cache/conftool/dbconfig/20220325-135346-root.json
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23096 and previous config saved to /var/cache/conftool/dbconfig/20220325-134251-ladsgroup.json
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23095 and previous config saved to /var/cache/conftool/dbconfig/20220325-133842-root.json
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23094 and previous config saved to /var/cache/conftool/dbconfig/20220325-132746-ladsgroup.json
* 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23093 and previous config saved to /var/cache/conftool/dbconfig/20220325-132338-root.json
* 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23092 and previous config saved to /var/cache/conftool/dbconfig/20220325-130834-root.json
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23091 and previous config saved to /var/cache/conftool/dbconfig/20220325-130146-ladsgroup.json
* 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23090 and previous config saved to /var/cache/conftool/dbconfig/20220325-130138-ladsgroup.json
* 12:49 hoo: Updated operations/dumps/dcat on snapshot10(08{{!}}09{{!}}11{{!}}12{{!}}13) from {{Gerrit|d4886f6}} to {{Gerrit|a1f46e4}}
* 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23089 and previous config saved to /var/cache/conftool/dbconfig/20220325-124633-ladsgroup.json
* 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23088 and previous config saved to /var/cache/conftool/dbconfig/20220325-123128-ladsgroup.json
* 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23086 and previous config saved to /var/cache/conftool/dbconfig/20220325-121623-ladsgroup.json
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23085 and previous config saved to /var/cache/conftool/dbconfig/20220325-120708-ladsgroup.json
* 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23084 and previous config saved to /var/cache/conftool/dbconfig/20220325-120701-ladsgroup.json
* 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23083 and previous config saved to /var/cache/conftool/dbconfig/20220325-115156-ladsgroup.json
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23082 and previous config saved to /var/cache/conftool/dbconfig/20220325-113651-ladsgroup.json
* 11:24 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23081 and previous config saved to /var/cache/conftool/dbconfig/20220325-112145-ladsgroup.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23080 and previous config saved to /var/cache/conftool/dbconfig/20220325-110217-marostegui.json
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23079 and previous config saved to /var/cache/conftool/dbconfig/20220325-104712-marostegui.json
* 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
* 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23078 and previous config saved to /var/cache/conftool/dbconfig/20220325-103310-ladsgroup.json
* 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23077 and previous config saved to /var/cache/conftool/dbconfig/20220325-103207-marostegui.json
* 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
* 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23076 and previous config saved to /var/cache/conftool/dbconfig/20220325-101701-marostegui.json
* 10:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
* 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23075 and previous config saved to /var/cache/conftool/dbconfig/20220325-101016-ladsgroup.json
* 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23074 and previous config saved to /var/cache/conftool/dbconfig/20220325-095511-ladsgroup.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23073 and previous config saved to /var/cache/conftool/dbconfig/20220325-094031-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23072 and previous config saved to /var/cache/conftool/dbconfig/20220325-094023-marostegui.json
* 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23071 and previous config saved to /var/cache/conftool/dbconfig/20220325-094006-ladsgroup.json
* 09:27 moritzm: updating libapache2-mod-auth-cas on moscovium/debmonitor1002
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23070 and previous config saved to /var/cache/conftool/dbconfig/20220325-092518-marostegui.json
* 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23069 and previous config saved to /var/cache/conftool/dbconfig/20220325-092500-ladsgroup.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23068 and previous config saved to /var/cache/conftool/dbconfig/20220325-091013-marostegui.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23067 and previous config saved to /var/cache/conftool/dbconfig/20220325-085508-marostegui.json
* 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23066 and previous config saved to /var/cache/conftool/dbconfig/20220325-082446-ladsgroup.json
* 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json
* 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json
* 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json
* 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:31 _joe_: deleting a couple zotero pods with excessive number of restarts
* 06:29 marostegui: dbmaint s4@eqiad [[phab:T300775|T300775]]
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json
* 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster
 
== 2022-03-24 ==
* 23:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
* 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
* 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23050 and previous config saved to /var/cache/conftool/dbconfig/20220324-223031-marostegui.json
* 22:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23049 and previous config saved to /var/cache/conftool/dbconfig/20220324-221526-marostegui.json
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
* 22:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
* 22:07 ebernhardson: restart wcqs-blazegraph on wcqs2001 to resolve intermittant BlazegraphFreeAllocatorsDecreasingRapidly
* 22:06 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23048 and previous config saved to /var/cache/conftool/dbconfig/20220324-220021-marostegui.json
* 21:54 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23047 and previous config saved to /var/cache/conftool/dbconfig/20220324-214515-marostegui.json
* 21:42 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 inflatador: bking@cumin1001 restarting blazegraph on wdqs[1003-1013].eqiad.wmnet for [[phab:T293862|T293862]]
* 21:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43385320f417052d8e60791b3cb970e6e3f088d5}}: fawiki: Set celebration logo for new vector ([[phab:T304314|T304314]]; 2/2) (duration: 00m 53s)
* 21:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-fawiki-new-year.png: {{Gerrit|43385320f417052d8e60791b3cb970e6e3f088d5}}: fawiki: Set celebration logo for new vector ([[phab:T304314|T304314]]; 1/2) (duration: 00m 50s)
* 21:07 thcipriani@deploy1002: Finished deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0 (duration: 00m 13s)
* 21:07 thcipriani@deploy1002: Started deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:03 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 50s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:773607{{!}}Start writing to $wmgAllServices the same value as to $wmfAllServices (T45956)]] (duration: 01m 17s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I14c5a9aa39}} (duration: 00m 50s)
* 20:41 krinkle@deploy1002: Synchronized src/Profiler.php: {{Gerrit|I14c5a9aa39}} (duration: 00m 49s)
* 20:34 krinkle@deploy1002: Synchronized lib/: {{Gerrit|I3882be35572}} (duration: 00m 50s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 krinkle@deploy1002: Synchronized wmf-config/profiler.php: {{Gerrit|I3882be35572}} (duration: 00m 51s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part 3) (duration: 00m 55s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 thcipriani@deploy1002: Synchronized docroot/noc/db.php: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part II) (duration: 00m 51s)
* 20:28 thcipriani@deploy1002: Synchronized tests: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part I) (duration: 00m 50s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:23 thcipriani@deploy1002: Synchronized portals: Config: [[gerrit:773380{{!}}Bumping portals to master (T282012)]] (duration: 00m 52s)
* 20:22 thcipriani@deploy1002: Synchronized portals/wikipedia.org/assets: Config: [[gerrit:773380{{!}}Bumping portals to master (T282012)]] (duration: 00m 52s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 20:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23045 and previous config saved to /var/cache/conftool/dbconfig/20220324-201305-marostegui.json
* 20:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23044 and previous config saved to /var/cache/conftool/dbconfig/20220324-201257-marostegui.json
* 20:08 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:773602{{!}}Use $wmgUseRestbaseVRS in comment (T45956)]] (duration: 01m 05s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23043 and previous config saved to /var/cache/conftool/dbconfig/20220324-195752-marostegui.json
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23042 and previous config saved to /var/cache/conftool/dbconfig/20220324-194246-marostegui.json
* 19:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23041 and previous config saved to /var/cache/conftool/dbconfig/20220324-192741-marostegui.json
* 19:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1148.eqiad.wmnet with OS buster
* 19:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1147.eqiad.wmnet with OS buster
* 19:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1142.eqiad.wmnet with OS buster
* 18:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
* 18:41 cstone: civicrm revision changed from {{Gerrit|b6ceb722}} to {{Gerrit|4e5b37c3}}
* 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23040 and previous config saved to /var/cache/conftool/dbconfig/20220324-183654-root.json
* 18:36 razzi: razzi@deneb:~$ sudo docker system prune (reclaimed 33GB)
* 18:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
* 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1144.eqiad.wmnet with OS buster
* 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1145.eqiad.wmnet with OS buster
* 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
* 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1142.eqiad.wmnet with OS buster
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23039 and previous config saved to /var/cache/conftool/dbconfig/20220324-182150-root.json
* 18:17 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23038 and previous config saved to /var/cache/conftool/dbconfig/20220324-180646-root.json
* 18:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
* 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
* 17:58 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23037 and previous config saved to /var/cache/conftool/dbconfig/20220324-175142-root.json
* 17:44 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 17:36 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23036 and previous config saved to /var/cache/conftool/dbconfig/20220324-173638-root.json
* 17:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 17:36 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 17:36 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:35 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23035 and previous config saved to /var/cache/conftool/dbconfig/20220324-173450-marostegui.json
* 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 17:34 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 17:32 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 17:32 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|05d55a9}}: fawiki: Set new year celebration ([[phab:T304314|T304314]]; 3/3) (duration: 00m 49s)
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:06 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|05d55a9}}: fawiki: Set new year celebration ([[phab:T304314|T304314]]; 2/3) (duration: 00m 49s)
* 17:04 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|05d55a9}}: fawiki: Set new year celebration ([[phab:T304314|T304314]]; 1/3) (duration: 00m 50s)
* 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye
* 16:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 16:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:25 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 01m 06s)
* 16:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 16:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 16:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 16:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:19 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye
* 16:13 brennen: trainsperiment ([[phab:T300203|T300203]]): blockers clear, logs triaged, rolling 1.39.0-wmf.4 out to all wikis again
* 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye
* 15:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 15:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 15:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
* 15:24 XioNoX: codfw: disable BGP to DE-CIX for link move
* 15:03 moritzm: installing openssl1.0 security updates on stretch
* 14:39 moritzm: installing containerd updates on ml-serve*
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23030 and previous config saved to /var/cache/conftool/dbconfig/20220324-143149-marostegui.json
* 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23029 and previous config saved to /var/cache/conftool/dbconfig/20220324-142233-root.json
* 14:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2002-dev.codfw.wmnet with OS bullseye
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23028 and previous config saved to /var/cache/conftool/dbconfig/20220324-140729-root.json
* 14:00 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
* 13:57 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23026 and previous config saved to /var/cache/conftool/dbconfig/20220324-135225-root.json
* 13:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2002-dev.codfw.wmnet with OS bullseye
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23025 and previous config saved to /var/cache/conftool/dbconfig/20220324-133721-root.json
* 13:34 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
* 13:26 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T45956|T45956]] (duration: 00m 49s)
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 reedy@deploy1002: Synchronized multiversion/: [[phab:T45956|T45956]] (duration: 00m 50s)
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23024 and previous config saved to /var/cache/conftool/dbconfig/20220324-132217-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:21 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
* 13:18 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:15 reedy@deploy1002: Synchronized tests/: [[phab:T45956|T45956]] (duration: 00m 49s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T292802|T292802]] (duration: 00m 50s)
* 12:54 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 for schema change', diff saved to https://phabricator.wikimedia.org/P23023 and previous config saved to /var/cache/conftool/dbconfig/20220324-125225-marostegui.json
* 11:47 jynus: updating eqiad swift-commonswiki backups of originals [[phab:T299764|T299764]]
* 11:26 mmandere: pool cp1076 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:22 jbond: puppet cert clean rendering.svc.eqiad.wmnet
* 11:21 jbond: removing old api.svc.codfw.wmnet.pem and appservers.svc.codfw.wmnet.pem from root@puppetmaster1001:/var/lib/puppet/server/ssl/ca/signed#
* 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
* 11:14 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 11:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS buster
* 11:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
* 11:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
* 10:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
* 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
* 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
* 10:46 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
* 10:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
* 10:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
* 10:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
* 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
* 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
* 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
* 10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
* 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
* 10:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS buster
* 10:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
* 10:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
* 10:20 mmandere: depool cp1076 for reimage - [[phab:T290005|T290005]]
* 10:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
* 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
* 10:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
* 09:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
* 09:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
* 09:31 mmandere: pool cp1078 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 09:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS buster
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:28 jnuche@deploy1002: Synchronized php-1.39.0-wmf.4/includes/Linker.php: (no justification provided) (duration: 00m 50s)
* 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
* 09:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
* 09:00 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
* 08:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS buster
* 08:45 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
* 08:44 marostegui: dbmaint s7@eqiad [[phab:T302658|T302658]]
* 08:43 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
* 08:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1013.eqiad.wmnet with OS bullseye
* 08:36 mmandere: depool cp1078 for reimage - [[phab:T290005|T290005]]
* 08:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
* 08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
* 08:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
* 08:11 marostegui: dbmaint s7@codfw [[phab:T302658|T302658]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P23022 and previous config saved to /var/cache/conftool/dbconfig/20220324-080528-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P23021 and previous config saved to /var/cache/conftool/dbconfig/20220324-075024-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23020 and previous config saved to /var/cache/conftool/dbconfig/20220324-074841-root.json
* 07:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P23019 and previous config saved to /var/cache/conftool/dbconfig/20220324-073520-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23018 and previous config saved to /var/cache/conftool/dbconfig/20220324-073337-root.json
* 07:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
* 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P23017 and previous config saved to /var/cache/conftool/dbconfig/20220324-072017-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23016 and previous config saved to /var/cache/conftool/dbconfig/20220324-071832-root.json
* 07:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P23015 and previous config saved to /var/cache/conftool/dbconfig/20220324-070513-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23014 and previous config saved to /var/cache/conftool/dbconfig/20220324-070327-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for testing', diff saved to https://phabricator.wikimedia.org/P23013 and previous config saved to /var/cache/conftool/dbconfig/20220324-065940-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23012 and previous config saved to /var/cache/conftool/dbconfig/20220324-064823-root.json
* 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 01:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
* 01:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
* 00:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 00:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 00:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 00:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 00:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 00:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
* 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
* 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
 
== 2022-03-23 ==
* 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 23:34 brennen: trainsperiment ([[phab:T300203|T300203]]): reverting to 1.39.0-wmf.3 on all wikis for [[phab:T304564|T304564]]; will move forward again after a fix.
* 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes [[phab:T301770|T301770]]
* 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773331{{!}}Enable split A/B testing on beta cluster (T301584)]] (duration: 00m 50s)
* 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:772408{{!}}Allow autoconfirmed users to view basic IP information (T303858)]] and [[gerrit:767216{{!}}Enable IPInfo on testwiki (T260598)]] (duration: 00m 50s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:771448{{!}}DynamicSidebar: remove unused extension (T304006)]] (duration: 00m 49s)
* 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771447{{!}}DynamicSidebar: remove from InitialiseSettings]] (duration: 00m 51s)
* 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771444{{!}}DynamicSidebar: remove from CommonSettings (T304006)]] (duration: 00m 50s)
* 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771443{{!}}wikitech: Remove DynamicSidebar (T304006)]] (duration: 00m 52s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:37 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:47 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
* 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 49m 41s)
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 17:48 brennen: trainsperiment ([[phab:T300203|T300203]]): starting prep for 1.39.0-wmf.4
* 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
* 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # [[phab:T304052|T304052]]
* 15:38 urbanecm: Created shnwikivoyage and guwwiki
* 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
* 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 04s)
* 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 07s)
* 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 06s)
* 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki ([[phab:T303727|T303727]])
* 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 10s)
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage ([[phab:T302797|T302797]])
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
* 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: {{Gerrit|9a0aed0}}: addWiki: Create GrowthExperiment tables for all new Wikipedias ([[phab:T304052|T304052]]) (duration: 01m 06s)
* 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
* 14:37 mmandere: depool cp1080 for reimage - [[phab:T290005|T290005]]
* 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 14:23 bblack: reboot cp1085 (downtimed)
* 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
* 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:48 Lucas_WMDE: UTC afternoon backport window done
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (2/2, production no-op) (duration: 01m 05s)
* 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (1/2, production no-op) (duration: 01m 07s)
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768090{{!}}Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients]] (duration: 01m 10s)
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:38 moritzm: restarting superset for OpenSSL update
* 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
* 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
* 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
* 13:07 mmandere: depool cp1082 for reimage - [[phab:T290005|T290005]]
* 12:58 moritzm: installing bind security updates
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
* 12:29 moritzm: restarting Turnilo for OpenSSL update
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
* 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
* 11:33 moritzm: installing apache security updates on stretch
* 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
* 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
* 10:30 moritzm: restarting ntpd
* 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
* 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
* 09:56 mmandere: depool cp1081 for reimage - [[phab:T290005|T290005]]
* 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
* 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
* 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
* 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:43 moritzm: installing openssl security updates
* 08:36 mmandere: depool cp1079 for reimage - [[phab:T290005|T290005]]
* 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
* 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
* 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:20 ejegg: updated payments-wiki from {{Gerrit|3048f0aa}} to {{Gerrit|28e24856}}
* 00:11 cjming: end running skin preference update script [[phab:T299104|T299104]]
 
== 2022-03-22 ==
* 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:27 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:20 ryankemper: [[phab:T301511|T301511]] Mutated cirrus codfw cluster settings to what [I think] they should be, see https://phabricator.wikimedia.org/T301511#7798415; forcing re-check
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22993 and previous config saved to /var/cache/conftool/dbconfig/20220322-221503-marostegui.json
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22992 and previous config saved to /var/cache/conftool/dbconfig/20220322-221455-marostegui.json
* 22:09 ryankemper: [[phab:T301511|T301511]] Forcing recheck of codfw cirrus setting check
* 22:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 22:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
* 21:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22991 and previous config saved to /var/cache/conftool/dbconfig/20220322-215950-marostegui.json
* 21:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
* 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22990 and previous config saved to /var/cache/conftool/dbconfig/20220322-214445-marostegui.json
* 21:39 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:35 ryankemper: [[phab:T301511|T301511]] Fixed elastic* eqiad cross-cluster search settings (see https://phabricator.wikimedia.org/T301511#7798267) to resolve the `ElasticSearch setting check` alerts in eqiad
* 21:33 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22989 and previous config saved to /var/cache/conftool/dbconfig/20220322-212939-marostegui.json
* 21:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:18 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 urbanecm: UTC late backport window done
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce18d4eeb255349e27163d5e5472fbe21c320322}}: testwiki: enable testing of topics match mode for GLAM events ([[phab:T301825|T301825]]) (duration: 01m 06s)
* 20:31 krinkle@deploy1002: Synchronized src/XhguiSaverPdo.php: {{Gerrit|I3882be35572}} (duration: 00m 50s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|17caf0359b99b69c0b3e0d7a5fa2f5c7fb7464ef}}: Enable EventGate logging for WikipediaPortal schema ([[phab:T271163|T271163]]) (duration: 01m 54s)
* 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28278 and previous config saved to /var/cache/conftool/dbconfig/20220522-185021-ladsgroup.json
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P28277 and previous config saved to /var/cache/conftool/dbconfig/20220522-183516-ladsgroup.json
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P28276 and previous config saved to /var/cache/conftool/dbconfig/20220522-182011-ladsgroup.json
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28275 and previous config saved to /var/cache/conftool/dbconfig/20220522-180506-ladsgroup.json
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28274 and previous config saved to /var/cache/conftool/dbconfig/20220522-171444-ladsgroup.json
* 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22986 and previous config saved to /var/cache/conftool/dbconfig/20220322-191049-marostegui.json
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28273 and previous config saved to /var/cache/conftool/dbconfig/20220522-144855-ladsgroup.json
* 19:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 19:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22985 and previous config saved to /var/cache/conftool/dbconfig/20220322-185542-marostegui.json
* 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28272 and previous config saved to /var/cache/conftool/dbconfig/20220522-144847-ladsgroup.json
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22984 and previous config saved to /var/cache/conftool/dbconfig/20220322-184037-marostegui.json
* 14:27 krinkle@deploy1002: Synchronized src/: {{Gerrit|Ia0a6d4794faaafc}} (duration: 00m 50s)
* 18:30 razzi: remove old karapace1001 known hosts following reimage: `razzi@puppetmaster1001:~$ ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "karapace1001.eqiad.wmnet"`
* 14:23 krinkle@deploy1002: Synchronized docroot/noc/: {{Gerrit|Ia0a6d4794faaafc}} (duration: 00m 50s)
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22982 and previous config saved to /var/cache/conftool/dbconfig/20220322-182531-marostegui.json
* 14:18 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|Ia0a6d4794faaafcb}} (2/2) (duration: 00m 42s)
* 18:01 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided) (duration: 05m 16s)
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided)
* 14:14 krinkle@deploy1002: scap failed: average error rate on 3/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 17:50 dcausse@deploy1002: Started scap: (no justification provided)
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1004.eqiad.wmnet with OS bullseye
* 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22981 and previous config saved to /var/cache/conftool/dbconfig/20220322-173301-marostegui.json
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:11 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ia0a6d4794faaafcb}} (1/2) (duration: 00m 50s)
* 17:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22980 and previous config saved to /var/cache/conftool/dbconfig/20220322-173253-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:25 brennen: trainsperiment ([[phab:T300203|T300203]]): with 1.39.0-wmf.3 on all wikis, we're paused for a planned catchup window - nothing to do at the moment, we'll deploy 1.39.0-wmf.4 tomorrow (2022-03-23).
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:02 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I31b1bfb1808b9523}} (duration: 00m 52s)
* 17:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22979 and previous config saved to /var/cache/conftool/dbconfig/20220322-171748-marostegui.json
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:15 taavi: deploy security patch for [[phab:T304354|T304354]]
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
* 13:28 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I3759179dba75a9419}} (duration: 00m 53s)
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
* 13:25 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I97878f8e6}} (duration: 00m 50s)
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22978 and previous config saved to /var/cache/conftool/dbconfig/20220322-170243-marostegui.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22974 and previous config saved to /var/cache/conftool/dbconfig/20220322-164738-marostegui.json
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:35 ebernhardson: [[phab:T303548|T303548]] start wikidatawiki reindexing on eqiad codfw and cloudelastic cirrus clusters
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 13:18 krinkle@deploy1002: Scap failed!: 7/8 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22973 and previous config saved to /var/cache/conftool/dbconfig/20220322-162917-marostegui.json
* 13:17 krinkle@deploy1002: scap failed: average error rate on 7/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28270 and previous config saved to /var/cache/conftool/dbconfig/20220522-122410-ladsgroup.json
* 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28269 and previous config saved to /var/cache/conftool/dbconfig/20220522-122402-ladsgroup.json
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22972 and previous config saved to /var/cache/conftool/dbconfig/20220322-162904-marostegui.json
* 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28267 and previous config saved to /var/cache/conftool/dbconfig/20220522-100436-ladsgroup.json
* 16:27 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 16:27 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28266 and previous config saved to /var/cache/conftool/dbconfig/20220522-100429-ladsgroup.json
* 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28265 and previous config saved to /var/cache/conftool/dbconfig/20220522-095327-ladsgroup.json
* 16:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P28264 and previous config saved to /var/cache/conftool/dbconfig/20220522-093822-ladsgroup.json
* 16:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28263 and previous config saved to /var/cache/conftool/dbconfig/20220522-093619-ladsgroup.json
* 16:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 16:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28262 and previous config saved to /var/cache/conftool/dbconfig/20220522-093611-ladsgroup.json
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22971 and previous config saved to /var/cache/conftool/dbconfig/20220322-161359-marostegui.json
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P28261 and previous config saved to /var/cache/conftool/dbconfig/20220522-092317-ladsgroup.json
* 16:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P28260 and previous config saved to /var/cache/conftool/dbconfig/20220522-092106-ladsgroup.json
* 16:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28259 and previous config saved to /var/cache/conftool/dbconfig/20220522-090811-ladsgroup.json
* 16:11 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P28258 and previous config saved to /var/cache/conftool/dbconfig/20220522-090601-ladsgroup.json
* 16:09 btullis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28257 and previous config saved to /var/cache/conftool/dbconfig/20220522-085056-ladsgroup.json
* 16:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28256 and previous config saved to /var/cache/conftool/dbconfig/20220522-084036-ladsgroup.json
* 16:07 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:59 moritzm: imported jvmquake 1.0.1 for stretch/buster (JDK8) and bullseye (JDK11)
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22970 and previous config saved to /var/cache/conftool/dbconfig/20220322-155854-marostegui.json
* 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 15:56 btullis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28255 and previous config saved to /var/cache/conftool/dbconfig/20220522-074303-ladsgroup.json
* 15:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 07:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22969 and previous config saved to /var/cache/conftool/dbconfig/20220322-154349-marostegui.json
* 07:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28254 and previous config saved to /var/cache/conftool/dbconfig/20220522-074255-ladsgroup.json
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28253 and previous config saved to /var/cache/conftool/dbconfig/20220522-064240-ladsgroup.json
* 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22968 and previous config saved to /var/cache/conftool/dbconfig/20220322-152508-marostegui.json
* 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28252 and previous config saved to /var/cache/conftool/dbconfig/20220522-064232-ladsgroup.json
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22967 and previous config saved to /var/cache/conftool/dbconfig/20220322-152247-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P28251 and previous config saved to /var/cache/conftool/dbconfig/20220522-053905-marostegui.json
* 15:17 hashar: Gerrit 3.3.10 up and running [[phab:T304226|T304226]]
* 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28250 and previous config saved to /var/cache/conftool/dbconfig/20220522-042249-ladsgroup.json
* 15:14 hashar: Stopping Gerrit for security update [[phab:T304226|T304226]]
* 04:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 15:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 [[phab:T304226|T304226]] (duration: 00m 10s)
* 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 15:13 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 [[phab:T304226|T304226]]
* 02:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:10 hashar: Upgrading and starting Gerrit on gerrit2001 (replica)
* 02:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28249 and previous config saved to /var/cache/conftool/dbconfig/20220522-002120-ladsgroup.json
* 15:06 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 [[phab:T304226|T304226]] (duration: 00m 12s)
* 00:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:06 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 [[phab:T304226|T304226]]
* 00:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22965 and previous config saved to /var/cache/conftool/dbconfig/20220322-144855-marostegui.json
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28248 and previous config saved to /var/cache/conftool/dbconfig/20220522-002112-ladsgroup.json
* 14:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P28247 and previous config saved to /var/cache/conftool/dbconfig/20220522-000607-ladsgroup.json
* 14:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 00:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22964 and previous config saved to /var/cache/conftool/dbconfig/20220322-144847-marostegui.json
* 00:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22963 and previous config saved to /var/cache/conftool/dbconfig/20220322-143341-marostegui.json
* 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28246 and previous config saved to /var/cache/conftool/dbconfig/20220522-000225-ladsgroup.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22962 and previous config saved to /var/cache/conftool/dbconfig/20220322-141836-marostegui.json
* 13:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 13:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 13:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22960 and previous config saved to /var/cache/conftool/dbconfig/20220322-134148-marostegui.json
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:40 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 13:40 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 13:36 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
* 13:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
* 13:33 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
* 13:27 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 13:26 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 13:26 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
* 13:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
* 13:20 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
* 13:19 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudgw2002-dev.codfw.wmnet
* 13:19 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
* 12:54 moritzm: installing 5.10.103 kernels on servers running a kernel from buster backports [[phab:T303179|T303179]]
* 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22959 and previous config saved to /var/cache/conftool/dbconfig/20220322-124117-root.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22958 and previous config saved to /var/cache/conftool/dbconfig/20220322-124109-root.json
* 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P22957 and previous config saved to /var/cache/conftool/dbconfig/20220322-123056-marostegui.json
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22956 and previous config saved to /var/cache/conftool/dbconfig/20220322-122613-root.json
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22955 and previous config saved to /var/cache/conftool/dbconfig/20220322-122605-root.json
* 12:24 marostegui: dbmaint s3@eqiad [[phab:T300600|T300600]]
* 12:24 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:772817{{!}}Enable WRITE BOTH on rest of s6 for templatelinks normalization (T299421)]] (duration: 00m 54s)
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:21 marostegui: dbmaint s7@eqiad [[phab:T300992|T300992]]
* 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:18 marostegui: dbmaint s6@eqiad [[phab:T300992|T300992]]
* 12:17 marostegui: dbmaint s5@eqiad [[phab:T300992|T300992]]
* 12:16 marostegui: dbmaint s8@eqiad [[phab:T300992|T300992]]
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:772816{{!}}Enable WRITE BOTH for templatelinks normalization in wikitech (T299421)]] (duration: 01m 41s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22954 and previous config saved to /var/cache/conftool/dbconfig/20220322-121110-root.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22953 and previous config saved to /var/cache/conftool/dbconfig/20220322-121101-root.json
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22952 and previous config saved to /var/cache/conftool/dbconfig/20220322-120123-marostegui.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22951 and previous config saved to /var/cache/conftool/dbconfig/20220322-115606-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22950 and previous config saved to /var/cache/conftool/dbconfig/20220322-115557-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22949 and previous config saved to /var/cache/conftool/dbconfig/20220322-114618-marostegui.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22948 and previous config saved to /var/cache/conftool/dbconfig/20220322-114102-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22946 and previous config saved to /var/cache/conftool/dbconfig/20220322-114051-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22945 and previous config saved to /var/cache/conftool/dbconfig/20220322-113113-marostegui.json
* 11:31 marostegui: Reboot db1100 and db1123 for kernel upgrade before master swap
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 for reboot', diff saved to https://phabricator.wikimedia.org/P22944 and previous config saved to /var/cache/conftool/dbconfig/20220322-113003-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for reboot', diff saved to https://phabricator.wikimedia.org/P22943 and previous config saved to /var/cache/conftool/dbconfig/20220322-112931-marostegui.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22942 and previous config saved to /var/cache/conftool/dbconfig/20220322-111607-marostegui.json
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:46 mmandere: pool cp1077 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS buster
* 10:26 _joe_: running check-restart-php on api appservers
* 10:22 _joe_: running check-and-restart on mw-eqiad-appservers
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22940 and previous config saved to /var/cache/conftool/dbconfig/20220322-101354-marostegui.json
* 10:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22939 and previous config saved to /var/cache/conftool/dbconfig/20220322-101346-marostegui.json
* 10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22938 and previous config saved to /var/cache/conftool/dbconfig/20220322-095841-marostegui.json
* 09:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 09:54 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]] (duration: 62m 07s)
* 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 09:46 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
* 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22937 and previous config saved to /var/cache/conftool/dbconfig/20220322-094335-marostegui.json
* 09:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS buster
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22936 and previous config saved to /var/cache/conftool/dbconfig/20220322-092830-marostegui.json
* 09:25 mmandere: depool cp1077 for reimage - [[phab:T290005|T290005]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P22935 and previous config saved to /var/cache/conftool/dbconfig/20220322-091718-root.json
* 09:11 dcausse: restarted blazegraph on wdqs2002 (deadlocked)
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P22934 and previous config saved to /var/cache/conftool/dbconfig/20220322-090214-root.json
* 08:59 XioNoX: drmrs propagate LVS med to core routers
* 08:52 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22933 and previous config saved to /var/cache/conftool/dbconfig/20220322-084710-root.json
* 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
* 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22932 and previous config saved to /var/cache/conftool/dbconfig/20220322-083206-root.json
* 08:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22931 and previous config saved to /var/cache/conftool/dbconfig/20220322-081806-marostegui.json
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22930 and previous config saved to /var/cache/conftool/dbconfig/20220322-081758-marostegui.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22929 and previous config saved to /var/cache/conftool/dbconfig/20220322-081702-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22928 and previous config saved to /var/cache/conftool/dbconfig/20220322-080713-marostegui.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22927 and previous config saved to /var/cache/conftool/dbconfig/20220322-080253-marostegui.json
* 07:57 urbanecm: UTC morning backport window completed
* 07:57 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/GrowthExperiments/modules/ext.growthExperiments.MentorDashboard/MenteeOverview/MenteeOverviewPresets.js: {{Gerrit|84877bd}}: MenteeOverviewPresets.getUsersToShow: Fix typo ([[phab:T304353|T304353]]) (duration: 00m 49s)
* 07:53 elukey: restart php-fpm on mw1449 - opcache full after deployment
* 07:49 elukey: restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json
* 07:47 elukey: depool mw1448 manually on the node (high cpu usage from php-fpm)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json
* 07:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8151bf2}}: Allow flooders to remove the group from themselves in viwiki ([[phab:T303578|T303578]]) (duration: 00m 50s)
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
* 07:17 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|caad5a4df35c0daa5fd3423e4abf5aa4d5c38a7a}}: wgCrossSiteAJAXdomains: Add foundationwiki and <nowiki>{</nowiki>ee,ge,punjabi<nowiki>}</nowiki>wikimedia ([[phab:T300978|T300978]]) (duration: 00m 49s)
* 07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b4a9935}}: Create "editautopatrolprotected" protection level for viwiki ([[phab:T303579|T303579]]) (duration: 00m 57s)
* 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
* 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
* 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json
* 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json
* 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to dbctl [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json
* 05:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
* 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
* 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 03:47 eileen: civicrm revision changed from {{Gerrit|457adec4}} to {{Gerrit|b6ceb722}}
* 02:56 eileen: civicrm revision changed from {{Gerrit|30c55f51}} to {{Gerrit|457adec4}}
* 02:56 eileen: revision changed from {{Gerrit|30c55f51}} to {{Gerrit|457adec4}}
* 02:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 02:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 01:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 00:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye


== 2022-03-21 ==
== 2022-05-21 ==
* 23:52 eileen: civicrm revision changed from {{Gerrit|52c45874}} to {{Gerrit|30c55f51}}
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P28245 and previous config saved to /var/cache/conftool/dbconfig/20220521-235102-ladsgroup.json
* 22:29 ryankemper: [[phab:T301955|T301955]] Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
* 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28244
* 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 21:59 ryankemper: [[phab:T301955|T301955]] Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
*


== 2022-03-20 ==
== 2022-05-20 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
* 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
* 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
* 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
* 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
* 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
* 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
* 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
* 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
* 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json
* 21:37 mutante: correction: mistake was to use FQDN [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
* 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - [[phab:T307142|T307142]]
* 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
* 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update  (to update statistics for latest wikipedia kcg) [[phab:T305281|T305281]]
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 16:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti5003.eqsin.wmnet with OS bullseye
* 16:33 robh: troubleshooting ganeti5003 ipmi failure via [[phab:T308211|T308211]]
* 16:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:09 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS bullseye
* 15:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS bullseye
* 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS bullseye
* 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
* 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS bullseye
* 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T', diff saved to https://phabricator.wikimedia.org/P28196 and previous config saved to /var/cache/conftool/dbconfig/20220520-151407-ladsgroup.json
* 15:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28195 and previous config saved to /var/cache/conftool/dbconfig/20220520-150838-root.json
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28194 and previous config saved to /var/cache/conftool/dbconfig/20220520-145334-root.json
* 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS bullseye
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28193 and previous config saved to /var/cache/conftool/dbconfig/20220520-144212-ladsgroup.json
* 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28192 and previous config saved to /var/cache/conftool/dbconfig/20220520-144111-ladsgroup.json
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28191 and previous config saved to /var/cache/conftool/dbconfig/20220520-143830-root.json
* 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28190 and previous config saved to /var/cache/conftool/dbconfig/20220520-142327-root.json
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28189 and previous config saved to /var/cache/conftool/dbconfig/20220520-142032-ladsgroup.json
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28188 and previous config saved to /var/cache/conftool/dbconfig/20220520-141316-ladsgroup.json
* 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28187 and previous config saved to /var/cache/conftool/dbconfig/20220520-141308-ladsgroup.json
* 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS bullseye
* 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28186 and previous config saved to /var/cache/conftool/dbconfig/20220520-140823-root.json
* 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28185 and previous config saved to /var/cache/conftool/dbconfig/20220520-135350-ladsgroup.json
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28184 and previous config saved to /var/cache/conftool/dbconfig/20220520-135319-root.json
* 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28183 and previous config saved to /var/cache/conftool/dbconfig/20220520-134515-ladsgroup.json
* 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 1%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28182 and previous config saved to /var/cache/conftool/dbconfig/20220520-133815-root.json
* 13:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-tls
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=varnish-fe
* 13:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-be
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28181 and previous config saved to /var/cache/conftool/dbconfig/20220520-132307-ladsgroup.json
* 13:15 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
* 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
* 12:42 mforns@deploy1002: Finished deploy [airflow-dags/analytics@51a203f]: (no justification provided) (duration: 00m 07s)
* 12:42 mforns@deploy1002: Started deploy [airflow-dags/analytics@51a203f]: (no justification provided)
* 12:37 moritzm: copy prometheus-mcrouter-exporter from buster-wikimedia to bullseye-wikimedia (needed for [[phab:T308214|T308214]])
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28180 and previous config saved to /var/cache/conftool/dbconfig/20220520-123045-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28179 and previous config saved to /var/cache/conftool/dbconfig/20220520-123037-ladsgroup.json
* 12:23 Amir1: killed refreshlinks suggestion in 10160
* 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28178 and previous config saved to /var/cache/conftool/dbconfig/20220520-121116-ladsgroup.json
* 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28177 and previous config saved to /var/cache/conftool/dbconfig/20220520-114234-ladsgroup.json
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28176 and previous config saved to /var/cache/conftool/dbconfig/20220520-114202-ladsgroup.json
* 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28175 and previous config saved to /var/cache/conftool/dbconfig/20220520-113207-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28174 and previous config saved to /var/cache/conftool/dbconfig/20220520-112449-ladsgroup.json
* 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28173 and previous config saved to /var/cache/conftool/dbconfig/20220520-111239-ladsgroup.json
* 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:09 jynus: drop backupcheck users from m1>dbbackups
* 10:54 moritzm: uploaded cas 6.4.6.3-wmf11u1 to apt.wikimedia.org/bullseye
* 10:52 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 10:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793737{{!}}Revert read new on frwiki for templatelinks migration]] (duration: 00m 51s)
* 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2063.codfw.wmnet with OS bullseye
* 09:39 volans@cumin1001: dbctl commit (dc=all): 'emergency depool', diff saved to https://phabricator.wikimedia.org/P28172 and previous config saved to /var/cache/conftool/dbconfig/20220520-093928-volans.json
* 09:34 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2063.codfw.wmnet with OS bullseye
* 08:54 vgutierrez: re-enabling puppet  and repooling cp3060 - [[phab:T308797|T308797]] [[phab:T243167|T243167]]
* 08:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2062.codfw.wmnet with OS bullseye
* 08:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P28171 and previous config saved to /var/cache/conftool/dbconfig/20220520-080719-root.json
* 07:53 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2062.codfw.wmnet with OS bullseye
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P28170 and previous config saved to /var/cache/conftool/dbconfig/20220520-075215-root.json
* 07:52 jayme: imported kubeconform 0.4.13-1 to buster-,bullseye-wikimedia - [[phab:T306165|T306165]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P28169 and previous config saved to /var/cache/conftool/dbconfig/20220520-073712-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P28168 and previous config saved to /var/cache/conftool/dbconfig/20220520-072208-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P28167 and previous config saved to /var/cache/conftool/dbconfig/20220520-070704-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P28166 and previous config saved to /var/cache/conftool/dbconfig/20220520-065200-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P28164 and previous config saved to /var/cache/conftool/dbconfig/20220520-063656-root.json
* 06:03 moritzm: racadm racreset on ganeti5003
* 05:09 marostegui: dbmaint s1@eqiad [[phab:T298554|T298554]]
* 01:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28162 and previous config saved to /var/cache/conftool/dbconfig/20220520-010743-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28161 and previous config saved to /var/cache/conftool/dbconfig/20220520-005237-ladsgroup.json
* 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bullseye
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28160 and previous config saved to /var/cache/conftool/dbconfig/20220520-003732-ladsgroup.json
* 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28159 and previous config saved to /var/cache/conftool/dbconfig/20220520-002227-ladsgroup.json


== 2022-03-19 ==
== 2022-05-19 ==
* 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
* 23:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netmon1003.wikimedia.org with OS bullseye
* 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days
* 22:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:07 robh: cp3060 idrac interface frozen, rebooted via power outlet control on [[phab:T243167|T243167]]
* 20:49 thcipriani: UTC late deploys done
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 bking@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:793128{{!}}zhwikiversity: Optimize logo per commons files (T308620)]] (duration: 00m 51s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 bking@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:792985


== 2022-03-17 ==
== 2022-05-18 ==
* 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28009 and previous config saved to /var/cache/conftool/dbconfig/20220518-235759-ladsgroup.json
* 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:53 mutante: webperf1001 - systemctl reset-failed
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:53 mutante: webperf1001/webperf2001 - re-enabling notifications in icinga that were disabled without comment (please don't do this, they keep being forgotten on a regular basis)
* 22:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 23:49 mutante: seaborgium - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:48 mutante: ms-be1063 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1054 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1036 - broken systemd state in Icinga since 15d - systemctl reset-failed
* 23:45 mutante: dumpsdata1002 - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:44 mutante: deploy2002 - broken systemd state in Icinga since 42d - systemctl reset-failed
* 23:43 mutante: an-db1002 - broken systemd state in Icinga since 48d - systemctl reset-failed
* 23:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P28008 and previous config saved to /var/cache/conftool/dbconfig/20220518-234254-ladsgroup.json
* 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P28007 and previous config saved to /var/cache/conftool/dbconfig/20220518-232749-ladsgroup.json
* 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28006 and previous config saved to /var/cache/conftool/dbconfig/20220518-232704-ladsgroup.json
* 23:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28005 and previous config saved to /var/cache/conftool/dbconfig/20220518-232656-ladsgroup.json
* 23:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: exim debug log capture
* 23:16 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: exim debug log capture
* 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28004 and previous config saved to /var/cache/conftool/dbconfig/20220518-231244-ladsgroup.json
* 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28003 and previous config saved to /var/cache/conftool/dbconfig/20220518-231151-ladsgroup.json
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28002 and previous config saved to /var/cache/conftool/dbconfig/20220518-230956-ladsgroup.json
* 23:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 23:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28001 and previous config saved to /var/cache/conftool/dbconfig/20220518-230948-ladsgroup.json
* 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28000 and previous config saved to /var/cache/conftool/dbconfig/20220518-225646-ladsgroup.json
* 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P27999 and previous config saved to /var/cache/conftool/dbconfig/20220518-225443-ladsgroup.json
* 22:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:46 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/resources/src/mediawiki.htmlform/cond-state.js: Backport: [[gerrit:793146{{!}}mw.htmlform: Fix conditional hide/disable for non-OOUI forms (T308626)]] (duration: 00m 51s)
* 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27998 and previous config saved to /var/cache/conftool/dbconfig/20220518-224141-ladsgroup.json
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P27997 and previous config saved to /var/cache/conftool/dbconfig/20220518-223938-ladsgroup.json
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:28 derick@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: [[gerrit:771606{{!}}Add new field to capture application URL link on Meta]] (duration: 00m 50s)
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:30 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/includes/parser/ParserObserver.php: Backport: [[gerrit:792665{{!}}parser: Avoid pushing the whole content to ParserObserver debug log (T305218)]] (duration: 00m 52s)
* 22:17 derick@deploy1002: Finished scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]] (duration: 11m 37s)
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27996 and previous config saved to /var/cache/conftool/dbconfig/20220518-222433-ladsgroup.json
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27995 and previous config saved to /var/cache/conftool/dbconfig/20220518-222145-ladsgroup.json
* 22:05 derick@deploy1002: Started scap: Backport: [[gerrit:771665{{!}}Add & improve message for the chapter/thorg application contact form]]
* 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:00 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771712{{!}}Revert "Revert "Revert "Enable Parsoid API everywhere"""]] (duration: 00m 51s)
* 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27994 and previous config saved to /var/cache/conftool/dbconfig/20220518-222132-ladsgroup.json
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27993 and previous config saved to /var/cache/conftool/dbconfig/20220518-221344-ladsgroup.json
* 21:48 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771707{{!}}Revert "Revert "Enable Parsoid API everywhere""]] (duration: 00m 51s)
* 22:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 21:45 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27992 and previous config saved to /var/cache/conftool/dbconfig/20220518-221331-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P27991 and previous config saved to /var/cache/conftool/dbconfig/20220518-220627-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27990 and previous config saved to /var/cache/conftool/dbconfig/20220518-215826-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P27989 and previous config saved to /var/cache/conftool/dbconfig/20220518-215122-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27988 and previous config saved to /var/cache/conftool/dbconfig/20220518-214321-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27987 and previous config saved to /var/cache/conftool/dbconfig/20220518-213617-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 21:29 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I0b6171b5452b}} (duration: 00m 55s)
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27986 and previous config saved to /var/cache/conftool/dbconfig/20220518-212926-ladsgroup.json
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27985 and previous config saved to /var/cache/conftool/dbconfig/20220518-212918-ladsgroup.json
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27984 and previous config saved to /var/cache/conftool/dbconfig/20220518-212815-ladsgroup.json
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27983 and previous config saved to /var/cache/conftool/dbconfig/20220518-211413-ladsgroup.json
* 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:40 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:35 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27982 and previous config saved to /var/cache/conftool/dbconfig/20220518-210017-ladsgroup.json
* 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27981 and previous config saved to /var/cache/conftool/dbconfig/20220518-210009-ladsgroup.json
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27980 and previous config saved to /var/cache/conftool/dbconfig/20220518-205908-ladsgroup.json
* 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27979 and previous config saved to /var/cache/conftool/dbconfig/20220518-204504-ladsgroup.json
* 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27978 and previous config saved to /var/cache/conftool/dbconfig/20220518-204403-ladsgroup.json
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27977 and previous config saved to /var/cache/conftool/dbconfig/20220518-203420-ladsgroup.json
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27976 and previous config saved to /var/cache/conftool/dbconfig/20220518-203412-ladsgroup.json
* 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27975 and previous config saved to /var/cache/conftool/dbconfig/20220518-202959-ladsgroup.json
* 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:23 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:21 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: [[gerrit:771394{{!}}Update invalid skin preference update script (T299104)]] (duration: 00m 51s)
* 20:20 cjming: end of UTC late backport window
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27974 and previous config saved to /var/cache/conftool/dbconfig/20220518-201907-ladsgroup.json
* 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27973 and previous config saved to /var/cache/conftool/dbconfig/20220518-201454-ladsgroup.json
* 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:14 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:793033{{!}}zhwiktionary: Declare commons files for logo (T308620)]] (duration: 00m 51s)
* 21:11 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]] (duration: 00m 50s)
* 20:13 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:793033{{!}}zhwiktionary: Declare commons files for logo (T308620)]] (duration: 00m 51s)
* 21:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 20:12 cjming@deploy1002: Synchronized static/images/project-logos/zhwiktionary.png: Config: [[gerrit:793033{{!}}zhwiktionary: Declare commons files for logo (T308620)]] (duration: 00m 52s)
* 20:57 ladsgroup@deploy1002: Finished scap: Revert "rdbms: Followups to automatic connection recovery patch" (duration: 11m 50s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized static/images/project-logos/zhwiktionary-2x.png: Config: [[gerrit:793033{{!}}zhwiktionary: Declare commons files for logo (T308620)]] (duration: 00m 52s)
* 20:10 cjming@deploy1002: Synchronized static/images/project-logos/zhwiktionary-1.5x.png: Config: [[gerrit:793033{{!}}zhwiktionary: Declare commons files for logo (T308620)]] (duration: 00m 52s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793098{{!}}zhwiki: Comment amendment for restricting "flow-hide" to autoconfirmed (T264489)]] (duration: 00m 52s)
* 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27972 and previous config saved to /var/cache/conftool/dbconfig/20220518-200402-ladsgroup.json
* 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27971 and previous config saved to /var/cache/conftool/dbconfig/20220518-194857-ladsgroup.json
* 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27970 and previous config saved to /var/cache/conftool/dbconfig/20220518-194701-ladsgroup.json
* 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27969 and previous config saved to /var/cache/conftool/dbconfig/20220518-194504-ladsgroup.json
* 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 19:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: exim debug log capture
* 19:24 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: exim debug log capture
* 19:23 jhathaway: capturing debug logs on mx2001.wikimedia.org
* 19:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maint
* 19:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maint
* 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27967 and previous config saved to /var/cache/conftool/dbconfig/20220518-181654-ladsgroup.json
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27966 and previous config saved to /var/cache/conftool/dbconfig/20220518-180149-ladsgroup.json
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27965 and previous config saved to /var/cache/conftool/dbconfig/20220518-174644-ladsgroup.json
* 17:40 mforns@deploy1002: Finished deploy [airflow-dags/analytics@ad59116]: (no justification provided) (duration: 00m 07s)
* 17:40 mforns@deploy1002: Started deploy [airflow-dags/analytics@ad59116]: (no justification provided)
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27964 and previous config saved to /var/cache/conftool/dbconfig/20220518-173139-ladsgroup.json
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27963 and previous config saved to /var/cache/conftool/dbconfig/20220518-164256-ladsgroup.json
* 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27962 and previous config saved to /var/cache/conftool/dbconfig/20220518-164248-ladsgroup.json
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27961 and previous config saved to /var/cache/conftool/dbconfig/20220518-162743-ladsgroup.json
* 16:22 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1011.eqiad.wmnet with reason: Setting up turnilo for the first time, there will be errors
* 16:22 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1011.eqiad.wmnet with reason: Setting up turnilo for the first time, there will be errors
* 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27960 and previous config saved to /var/cache/conftool/dbconfig/20220518-161238-ladsgroup.json
* 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27959 and previous config saved to /var/cache/conftool/dbconfig/20220518-155733-ladsgroup.json
* 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:36 Amir1: promoted user:Ladsgroup to admin of testcommonswiki
* 15:32 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/CommonsMetadata/src: Backport: [[gerrit:792659{{!}}Return early if the ParserOutput doesn't have any text (T308663)]] (duration: 00m 52s)
* 15:15 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3072d55]: (no justification provided) (duration: 00m 07s)
* 15:15 mforns@deploy1002: Started deploy [airflow-dags/analytics@3072d55]: (no justification provided)
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27957 and previous config saved to /var/cache/conftool/dbconfig/20220518-150722-ladsgroup.json
* 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27956 and previous config saved to /var/cache/conftool/dbconfig/20220518-150714-ladsgroup.json
* 15:04 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 15:04 vgutierrez: rolling upgrade to HAProxy 2.4.17 in eqiad - [[phab:T307444|T307444]]
* 15:03 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 14:56 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 14:56 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27955 and previous config saved to /var/cache/conftool/dbconfig/20220518-145603-ladsgroup.json
* 14:55 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27954 and previous config saved to /var/cache/conftool/dbconfig/20220518-145208-ladsgroup.json
* 14:45 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Set commonswiki to 1.39.0-wmf.12
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27952 and previous config saved to /var/cache/conftool/dbconfig/20220518-144058-ladsgroup.json
* 14:39 jnuche@deploy1002: scap failed: average error rate on 6/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27951 and previous config saved to /var/cache/conftool/dbconfig/20220518-143703-ladsgroup.json
* 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27949 and previous config saved to /var/cache/conftool/dbconfig/20220518-142553-ladsgroup.json
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27948 and previous config saved to /var/cache/conftool/dbconfig/20220518-142158-ladsgroup.json
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27947 and previous config saved to /var/cache/conftool/dbconfig/20220518-141048-ladsgroup.json
* 14:10 vgutierrez: rolling upgrade to HAProxy 2.4.17 in esams - [[phab:T307444|T307444]]
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27946 and previous config saved to /var/cache/conftool/dbconfig/20220518-140812-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27945 and previous config saved to /var/cache/conftool/dbconfig/20220518-140804-ladsgroup.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27944 and previous config saved to /var/cache/conftool/dbconfig/20220518-135259-ladsgroup.json
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:44 jforrester@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 53s)
* 13:43 jforrester@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 52s)
* 13:42 jforrester@deploy1002: Synchronized w/health-check.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 52s)
* 13:40 jforrester@deploy1002: Synchronized rpc/RunJobs.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 51s)
* 13:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2060.codfw.wmnet with OS bullseye
* 13:39 jforrester@deploy1002: Synchronized docroot/noc/conf/highlight.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 51s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ns-recursor1.openstack.codfw1dev.wikimediacloud.org on all recursors
* 13:39 volans@cumin1001: START - Cookbook sre.dns.wipe-cache ns-recursor1.openstack.codfw1dev.wikimediacloud.org on all recursors
* 13:39 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ns-recursor0.openstack.codfw1dev.wikimediacloud.org on all recursors
* 13:39 volans@cumin1001: START - Cookbook sre.dns.wipe-cache ns-recursor0.openstack.codfw1dev.wikimediacloud.org on all recursors
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 jforrester@deploy1002: Synchronized docroot/wwwportal/w/search-redirect.php: Config: [[gerrit:740304{{!}}Make use of the ?? operator in more trivial situations]] (duration: 00m 51s)
* 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27943 and previous config saved to /var/cache/conftool/dbconfig/20220518-133753-ladsgroup.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:34 vgutierrez: rolling upgrade to HAProxy 2.4.17 in codfw - [[phab:T307444|T307444]]
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27942 and previous config saved to /var/cache/conftool/dbconfig/20220518-133231-ladsgroup.json
* 13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27941 and previous config saved to /var/cache/conftool/dbconfig/20220518-133223-ladsgroup.json
* 13:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:27 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771621{{!}}Allow wikifunctions.org to use the CAPTCHA system]] (duration: 00m 52s)
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2060.codfw.wmnet with reason: host reimage
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27940 and previous config saved to /var/cache/conftool/dbconfig/20220518-132248-ladsgroup.json
* 13:22 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791787{{!}}InitialiseSettings: Enable SandboxLink for uzwiki (T308399)]] (duration: 00m 53s)
* 13:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2060.codfw.wmnet with reason: host reimage
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27939 and previous config saved to /var/cache/conftool/dbconfig/20220518-132011-ladsgroup.json
* 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 13:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27938 and previous config saved to /var/cache/conftool/dbconfig/20220518-132002-ladsgroup.json
* 13:18 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771620{{!}}Allow wikifunctions.org URLs to be used in the URL Shortener]] (duration: 00m 54s)
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27937 and previous config saved to /var/cache/conftool/dbconfig/20220518-131718-ladsgroup.json
* 13:15 jforrester@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/GrowthExperiments: Backport: [[gerrit:792655{{!}}Campaign templates: show legal footer on mobile (T307521)]] (duration: 00m 53s)
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 jforrester@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:677327{{!}}Disable LocalisationUpdate, part III (T158360)]] (duration: 00m 53s)
* 13:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:06 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:677326{{!}}Disable LocalisationUpdate, part II (T158360)]] (duration: 00m 52s)
* 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27936 and previous config saved to /var/cache/conftool/dbconfig/20220518-130457-ladsgroup.json
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:02 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792737{{!}}[shnwiki] Enable the SandboxLink extension (T308623)]] (duration: 00m 53s)
* 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27935 and previous config saved to /var/cache/conftool/dbconfig/20220518-130213-ladsgroup.json
* 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27934 and previous config saved to /var/cache/conftool/dbconfig/20220518-124952-ladsgroup.json
* 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27933 and previous config saved to /var/cache/conftool/dbconfig/20220518-124708-ladsgroup.json
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2060.codfw.wmnet with OS bullseye
* 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27932 and previous config saved to /var/cache/conftool/dbconfig/20220518-123447-ladsgroup.json
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27931 and previous config saved to /var/cache/conftool/dbconfig/20220518-123211-ladsgroup.json
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27930 and previous config saved to /var/cache/conftool/dbconfig/20220518-123158-ladsgroup.json
* 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27929 and previous config saved to /var/cache/conftool/dbconfig/20220518-121653-ladsgroup.json
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27928 and previous config saved to /var/cache/conftool/dbconfig/20220518-120209-ladsgroup.json
* 12:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27927 and previous config saved to /var/cache/conftool/dbconfig/20220518-120148-ladsgroup.json
* 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27925 and previous config saved to /var/cache/conftool/dbconfig/20220518-114643-ladsgroup.json
* 11:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 11:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 11:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 11:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2059.codfw.wmnet with OS bullseye
* 11:00 vgutierrez: rolling upgrade to HAProxy 2.4.17 in drmrs - [[phab:T307444|T307444]]
* 10:59 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:59 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:58 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:56 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27924 and previous config saved to /var/cache/conftool/dbconfig/20220518-105046-ladsgroup.json
* 10:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2059.codfw.wmnet with reason: host reimage
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27923 and previous config saved to /var/cache/conftool/dbconfig/20220518-104628-ladsgroup.json
* 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27922 and previous config saved to /var/cache/conftool/dbconfig/20220518-104620-ladsgroup.json
* 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2059.codfw.wmnet with reason: host reimage
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
* 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27921 and previous config saved to /var/cache/conftool/dbconfig/20220518-103541-ladsgroup.json
* 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27920 and previous config saved to /var/cache/conftool/dbconfig/20220518-103115-ladsgroup.json
* 10:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2059.codfw.wmnet with OS bullseye
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27919 and previous config saved to /var/cache/conftool/dbconfig/20220518-102036-ladsgroup.json
* 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27918 and previous config saved to /var/cache/conftool/dbconfig/20220518-101610-ladsgroup.json
* 10:14 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host backupmon1001.eqiad.wmnet
* 10:06 marostegui: Reboot dbproxy2* for kernel upgrade [[phab:T307673|T307673]]
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27917 and previous config saved to /var/cache/conftool/dbconfig/20220518-100531-ladsgroup.json
* 10:04 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27915 and previous config saved to /var/cache/conftool/dbconfig/20220518-100105-ladsgroup.json
* 09:54 root@cumin1001: START - Cookbook sre.dns.netbox
* 09:54 root@cumin1001: START - Cookbook sre.ganeti.makevm for new host backupmon1001.eqiad.wmnet
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27914 and previous config saved to /var/cache/conftool/dbconfig/20220518-095442-ladsgroup.json
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:46 dcausse: [[phab:T308647|T308647]]: banning elastic2054 from production-search-psi-codfw and elastic2054-production-search-codfw
* 09:45 vgutierrez: rolling upgrade to HAProxy 2.4.17 in eqsin - [[phab:T307444|T307444]]
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 09:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 09:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27913 and previous config saved to /var/cache/conftool/dbconfig/20220518-094106-ladsgroup.json
* 09:27 dcausse: depooling elastic2054 seeing hardware errors (Hardware error from APEI Generic Hardware Error Source: 65534)
* 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27912 and previous config saved to /var/cache/conftool/dbconfig/20220518-092601-ladsgroup.json
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 09:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:17 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27911 and previous config saved to /var/cache/conftool/dbconfig/20220518-091544-ladsgroup.json
* 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2056.codfw.wmnet with OS bullseye
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27910 and previous config saved to /var/cache/conftool/dbconfig/20220518-091056-ladsgroup.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:08 hashar: Restarting CI Jenkins once more
* 09:06 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/GeoData/includes/Searcher.php: Backport: [[gerrit:792652{{!}}Remove reference to Elastica\Type (T308044)]] (duration: 00m 52s)
* 09:05 vgutierrez: rolling upgrade to HAProxy 2..4.17 in ulsfo
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4003.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4003.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 09:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27909 and previous config saved to /var/cache/conftool/dbconfig/20220518-085551-ladsgroup.json
* 08:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27908 and previous config saved to /var/cache/conftool/dbconfig/20220518-084910-ladsgroup.json
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27907 and previous config saved to /var/cache/conftool/dbconfig/20220518-084902-ladsgroup.json
* 08:41 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
* 08:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27906 and previous config saved to /var/cache/conftool/dbconfig/20220518-083357-ladsgroup.json
* 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4003.ulsfo.wmnet with OS bullseye
* 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27905 and previous config saved to /var/cache/conftool/dbconfig/20220518-083022-ladsgroup.json
* 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:27 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]] (duration: 00m 53s)
* 08:26 moritzm: drain ganeti5002 [[phab:T308211|T308211]]
* 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:26 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 08:25 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/eqsin [[phab:T308211|T308211]]
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:24 hashar: CI Jenkins hosts are all back and operational
* 08:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2056.codfw.wmnet with reason: host reimage
* 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4003.ulsfo.wmnet with reason: host reimage
* 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27904 and previous config saved to /var/cache/conftool/dbconfig/20220518-081852-ladsgroup.json
* 08:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2056.codfw.wmnet with reason: host reimage
* 08:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4003.ulsfo.wmnet with reason: host reimage
* 08:12 jnuche@deploy1002: deploy-promote aborted:  (duration: 03m 02s)
* 08:11 hashar: Jenkins CI is down, can't connect to the agents
* 08:11 moritzm: upgrading ganeti packages in eqsin to Ganeti 3.0 [[phab:T308211|T308211]]
* 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27903 and previous config saved to /var/cache/conftool/dbconfig/20220518-080347-ladsgroup.json
* 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27902 and previous config saved to /var/cache/conftool/dbconfig/20220518-080339-ladsgroup.json
* 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 08:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 08:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2056.codfw.wmnet with OS bullseye
* 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4003.ulsfo.wmnet with OS bullseye
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27900 and previous config saved to /var/cache/conftool/dbconfig/20220518-075826-ladsgroup.json
* 07:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P27898 and previous config saved to /var/cache/conftool/dbconfig/20220518-075620-ladsgroup.json
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:54 hashar: Restarting CI Jenkins
* 07:41 moritzm: imported jenkins 2.332.3 to thirdparty/ci for buster-wikimedia
* 07:36 dcausse: closing UTC morning backport window
* 07:34 dcausse@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/WikibaseCirrusSearch/src/Query/HasLicenseFeature.php: Backport: [[gerrit:792650{{!}}haslicense: Apply minimum_should_match for elastic 7.x (T288765)]] (duration: 00m 52s)
* 07:32 dcausse@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/CirrusSearch/includes/Query/FullTextSimpleMatchQueryBuilder.php: Backport: [[gerrit:792649{{!}}Resolve minimum_should_match warnings during random scoring (T288765)]] (duration: 00m 56s)
* 07:30 hashar: Restarting CI Jenkins
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
* 07:17 marostegui: Cold reset  wtp1045.mgmt ipmi
* 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
* 01:05 ejegg: updated fundraising CiviCRM from {{Gerrit|d45afdfc}} to {{Gerrit|b8b8c177}}
 
== 2022-05-17 ==
* 23:36 ejegg: updated payments-wiki from {{Gerrit|590fac28}} to {{Gerrit|d9d63a3d}}
* 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27896 and previous config saved to /var/cache/conftool/dbconfig/20220517-222904-ladsgroup.json
* 22:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:16 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: {{Gerrit|c2151b3}}: Update interwiki cache (duration: 00m 52s)
* 22:15 urbanecm@deploy1002: Synchronized langlist: {{Gerrit|cd704d4f}}: langlist: add kcg language ([[phab:T305279|T305279]]) (duration: 00m 53s)
* 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P27895 and previous config saved to /var/cache/conftool/dbconfig/20220517-221359-ladsgroup.json
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P27894 and previous config saved to /var/cache/conftool/dbconfig/20220517-215854-ladsgroup.json
* 21:52 mutante: alert1001 - systemctl start certspotter (after alert that the unit was failed. happens sometimes)
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27893 and previous config saved to /var/cache/conftool/dbconfig/20220517-214349-ladsgroup.json
* 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27892 and previous config saved to /var/cache/conftool/dbconfig/20220517-212530-ladsgroup.json
* 21:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 21:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27891 and previous config saved to /var/cache/conftool/dbconfig/20220517-212316-ladsgroup.json
* 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27890 and previous config saved to /var/cache/conftool/dbconfig/20220517-212040-ladsgroup.json
* 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27889 and previous config saved to /var/cache/conftool/dbconfig/20220517-210535-ladsgroup.json
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P27888 and previous config saved to /var/cache/conftool/dbconfig/20220517-205030-ladsgroup.json
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 cjming: end of UTC late backport & config window
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:22 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 53s)
* 20:21 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 52s)
* 20:20 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity-2x.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 53s)
* 20:19 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity-1.5x.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 56s)
* 20:18 cjming@deploy1002: Synchronized static/images/project-logos/betawikiversity.png: Config: [[gerrit:792710{{!}}betawikiversity: HIDPI support for logo (T308604)]] (duration: 00m 54s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792272{{!}}Deploy TOC A/B test to pilot wikis except frwiki, ptwiki (T306607)]] (duration: 00m 53s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:44 bd808: Updated Toolhub to 42072d, applied db migrations, and rebuilt search indexes
* 19:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 19:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 19:29 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 19:28 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 19:26 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 19:25 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maint
* 18:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maint
* 18:26 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-tool1011.eqiad.wmnet
* 18:16 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:58 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-tool1011.eqiad.wmnet
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27884 and previous config saved to /var/cache/conftool/dbconfig/20220517-172632-ladsgroup.json
* 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27883 and previous config saved to /var/cache/conftool/dbconfig/20220517-172521-ladsgroup.json
* 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27882 and previous config saved to /var/cache/conftool/dbconfig/20220517-172001-ladsgroup.json
* 17:16 robh: ganeti4003 rebooting for firmware updates via [[phab:T307997|T307997]]
* 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 17:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27881 and previous config saved to /var/cache/conftool/dbconfig/20220517-170456-ladsgroup.json
* 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27880 and previous config saved to /var/cache/conftool/dbconfig/20220517-164951-ladsgroup.json
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27878 and previous config saved to /var/cache/conftool/dbconfig/20220517-163446-ladsgroup.json
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27877 and previous config saved to /var/cache/conftool/dbconfig/20220517-163024-ladsgroup.json
* 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Manual repool', diff saved to https://phabricator.wikimedia.org/P27876 and previous config saved to /var/cache/conftool/dbconfig/20220517-162835-ladsgroup.json
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P27875 and previous config saved to /var/cache/conftool/dbconfig/20220517-162738-ladsgroup.json
* 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27874 and previous config saved to /var/cache/conftool/dbconfig/20220517-154502-ladsgroup.json
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27873 and previous config saved to /var/cache/conftool/dbconfig/20220517-154310-ladsgroup.json
* 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27872 and previous config saved to /var/cache/conftool/dbconfig/20220517-153921-ladsgroup.json
* 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27871 and previous config saved to /var/cache/conftool/dbconfig/20220517-152416-ladsgroup.json
* 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27870 and previous config saved to /var/cache/conftool/dbconfig/20220517-150911-ladsgroup.json
* 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27869 and previous config saved to /var/cache/conftool/dbconfig/20220517-145406-ladsgroup.json
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27868 and previous config saved to /var/cache/conftool/dbconfig/20220517-144959-ladsgroup.json
* 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27867 and previous config saved to /var/cache/conftool/dbconfig/20220517-144946-ladsgroup.json
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27865 and previous config saved to /var/cache/conftool/dbconfig/20220517-143916-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27864 and previous config saved to /var/cache/conftool/dbconfig/20220517-143441-ladsgroup.json
* 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27863 and previous config saved to /var/cache/conftool/dbconfig/20220517-142411-ladsgroup.json
* 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27862 and previous config saved to /var/cache/conftool/dbconfig/20220517-141936-ladsgroup.json
* 14:19 hnowlan@deploy1002: Finished deploy [restbase/deploy@6e39559]: Add kcgwiki - [[phab:T305281|T305281]] (duration: 119m 34s)
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27861 and previous config saved to /var/cache/conftool/dbconfig/20220517-140906-ladsgroup.json
* 14:08 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 14:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
* 14:06 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 14:05 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27860 and previous config saved to /var/cache/conftool/dbconfig/20220517-140431-ladsgroup.json
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27859 and previous config saved to /var/cache/conftool/dbconfig/20220517-140016-ladsgroup.json
* 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27858 and previous config saved to /var/cache/conftool/dbconfig/20220517-140008-ladsgroup.json
* 13:55 tgr@deploy1002: Finished scap: Backport with i18n changes: [[gerrit:792478{{!}}Account creation: add Thank you banner texts]] (duration: 14m 57s)
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27857 and previous config saved to /var/cache/conftool/dbconfig/20220517-135401-ladsgroup.json
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27856 and previous config saved to /var/cache/conftool/dbconfig/20220517-135006-ladsgroup.json
* 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 13:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27855 and previous config saved to /var/cache/conftool/dbconfig/20220517-134838-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27854 and previous config saved to /var/cache/conftool/dbconfig/20220517-134503-ladsgroup.json
* 13:40 tgr@deploy1002: Started scap: Backport with i18n changes: [[gerrit:792478{{!}}Account creation: add Thank you banner texts]]
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27853 and previous config saved to /var/cache/conftool/dbconfig/20220517-133333-ladsgroup.json
* 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27852 and previous config saved to /var/cache/conftool/dbconfig/20220517-132958-ladsgroup.json
* 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27851 and previous config saved to /var/cache/conftool/dbconfig/20220517-131827-ladsgroup.json
* 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27850 and previous config saved to /var/cache/conftool/dbconfig/20220517-131453-ladsgroup.json
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27849 and previous config saved to /var/cache/conftool/dbconfig/20220517-131040-ladsgroup.json
* 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27848 and previous config saved to /var/cache/conftool/dbconfig/20220517-131032-ladsgroup.json
* 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27846 and previous config saved to /var/cache/conftool/dbconfig/20220517-130322-ladsgroup.json
* 13:02 Amir1: killed cawiki's refreshLinkRecommendations.php ([[phab:T299021|T299021]])
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27845 and previous config saved to /var/cache/conftool/dbconfig/20220517-125713-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27844 and previous config saved to /var/cache/conftool/dbconfig/20220517-125527-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P27843 and previous config saved to /var/cache/conftool/dbconfig/20220517-124227-ladsgroup.json
* 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27842 and previous config saved to /var/cache/conftool/dbconfig/20220517-124022-ladsgroup.json
* 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27841 and previous config saved to /var/cache/conftool/dbconfig/20220517-122517-ladsgroup.json
* 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P27840 and previous config saved to /var/cache/conftool/dbconfig/20220517-122201-ladsgroup.json
* 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 12:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 12:20 hnowlan@deploy1002: Started deploy [restbase/deploy@6e39559]: Add kcgwiki - [[phab:T305281|T305281]]
* 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:04 moritzm: draining ganeti4003 [[phab:T307997|T307997]]
* 11:53 moritzm: failover Ganeti master in ulsfo to ganeti4001 [[phab:T307997|T307997]]
* 10:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4002.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 10:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4002.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After depooling', diff saved to https://phabricator.wikimedia.org/P27838 and previous config saved to /var/cache/conftool/dbconfig/20220517-100223-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After depooling', diff saved to https://phabricator.wikimedia.org/P27837 and previous config saved to /var/cache/conftool/dbconfig/20220517-094719-root.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After depooling', diff saved to https://phabricator.wikimedia.org/P27836 and previous config saved to /var/cache/conftool/dbconfig/20220517-093216-root.json
* 09:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4002.ulsfo.wmnet with OS bullseye
* 09:20 XioNoX: all switches, split configuration per interfaces (use new get_junos_interfaces function)
* 09:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After depooling', diff saved to https://phabricator.wikimedia.org/P27835 and previous config saved to /var/cache/conftool/dbconfig/20220517-091712-root.json
* 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:16 btullis@deploy1002: Finished deploy [analytics/turnilo/deploy@bf60521]: (no justification provided) (duration: 00m 03s)
* 09:16 btullis@deploy1002: Started deploy [analytics/turnilo/deploy@bf60521]: (no justification provided)
* 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4002.ulsfo.wmnet with reason: host reimage
* 09:05 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4002.ulsfo.wmnet with reason: host reimage
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After depooling', diff saved to https://phabricator.wikimedia.org/P27834 and previous config saved to /var/cache/conftool/dbconfig/20220517-090208-root.json
* 08:59 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/specials/pagers/ContribsPager.php: Backport: [[gerrit:792474{{!}}ContribsPager: Update index hint to use revision table in READ NEW (T307295)]] (duration: 00m 53s)
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:54 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.12/includes/specials/pagers/ContribsPager.php: Backport: [[gerrit:792475{{!}}ContribsPager: Update index hint to use revision table in READ NEW (T307295)]] (duration: 00m 56s)
* 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4002.ulsfo.wmnet with OS bullseye
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: After depooling', diff saved to https://phabricator.wikimedia.org/P27833 and previous config saved to /var/cache/conftool/dbconfig/20220517-084704-root.json
* 08:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:40 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792565{{!}}Turn on read new for templatelinks on frwiki (T306673)]] (duration: 02m 25s)
* 08:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b569ee8]: Update DAG spark conf [airflow-dags/analytics@b569ee8] (duration: 00m 07s)
* 08:21 aqu@deploy1002: Started deploy [airflow-dags/analytics@b569ee8]: Update DAG spark conf [airflow-dags/analytics@b569ee8]
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:08 moritzm: installing ffmpeg security updates on stretch
* 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:06 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:53 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]] (duration: 14m 35s)
* 07:39 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 07:36 kart_: UTC morning backport window - Done.
* 07:36 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791481{{!}}Enable Section Translation in bcl, is, ne, pa, ts and ur Wikipedias (T304828)]] (duration: 00m 53s)
* 07:35 jnuche@deploy1002: stage-train aborted:  (duration: 25m 33s)
* 07:35 jnuche@deploy1002: deploy-promote aborted:  (duration: 14m 44s)
* 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:22 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.12  refs [[phab:T305218|T305218]]
* 07:20 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791315{{!}}Deploy template search improvements to enwiki (T303802)]] (duration: 02m 11s)
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:17 XioNoX: core routers, split configuration per interfaces (use new get_junos_interfaces function)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791314{{!}}Deploy VE template dialog improvements to enwiki (T306967)]] (duration: 00m 50s)
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 XioNoX: management routers, split configuration per interfaces (use new get_junos_interfaces function)
* 06:37 XioNoX: management switches, split configuration per interfaces (use new get_junos_interfaces function)
* 05:44 _joe_: restarted rsyslog on kubernetes2022
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-05-16 ==
* 22:14 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 22:14 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 21:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:47 robh: ganeti4002 rebooting for firmware update via [[phab:T307997|T307997]]
* 21:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:26 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:14 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:07 cstone: civicrm revision changed from {{Gerrit|6d85f1cc}} to {{Gerrit|d45afdfc}}
* 21:05 mutante: gerrit2002 (in setup) - rebooting
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 ladsgroup@deploy1002: Started scap: Revert "rdbms: Followups to automatic connection recovery patch"
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:41 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792141{{!}}Revert "cirrus: Turn on AB test of wbsearchentities profiles" (T306644)]] (duration: 00m 51s)
* 20:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:36 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] and [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22798 and previous config saved to /var/cache/conftool/dbconfig/20220317-204128-marostegui.json
* 20:34 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-yi.svg: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 49s)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:33 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-he.svg: Config: [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:31 catrope@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:29 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:763779{{!}}Revert "Enable Parsoid API everywhere" (T302081)]] (duration: 00m 50s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:29 catrope@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22797 and previous config saved to /var/cache/conftool/dbconfig/20220317-202623-marostegui.json
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791725{{!}}thwikibooks: Enable import (T308374)]] (duration: 00m 51s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 catrope@deploy1002: Synchronized wmf-config: Config: [[gerrit:792149{{!}}GrowthExperiments: Update campaigns benefit list config (T305659)]] (duration: 00m 51s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22796 and previous config saved to /var/cache/conftool/dbconfig/20220317-201118-marostegui.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22795 and previous config saved to /var/cache/conftool/dbconfig/20220317-195613-marostegui.json
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:53 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.26/skins/Vector/includes/Hooks.php: Backport: [[gerrit:771395{{!}}Fix updateUserLinksDropdownItems not being called (T304002)]] (duration: 00m 50s)
* 18:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:27 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:18 akosiaris: cordon kubernetes10<nowiki>{</nowiki>18..22<nowiki>}</nowiki> [[phab:T293728|T293728]]
* 18:12 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 18:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:41 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:41 arturo: uploaded prometheus-openstack-exporter 0.0.8-4~wmf1 to bullseye-wikimedia ([[phab:T302178|T302178]])
* 17:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:34 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 17:34 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 17:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:28 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 17:28 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 17:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
* 17:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
* 17:25 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:24 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
* 17:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
* 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
* 17:21 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:21 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:20 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
* 17:20 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
* 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
* 17:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:15 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
* 17:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:10 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
* 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1022.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1021.eqiad.wmnet with OS bullseye
* 17:07 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1019.eqiad.wmnet with OS bullseye
* 17:06 bblack: geodns - Cyprus routed to new drmrs edge DC (first live users!) - will phase in over the standard 10 minute DNS TTL
* 17:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1018.eqiad.wmnet with OS bullseye
* 17:03 volans: restart atftp on install1003
* 17:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:50 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:48 XioNoX: disable BGP to Lumen in codfw for fiber move
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22794 and previous config saved to /var/cache/conftool/dbconfig/20220317-164228-marostegui.json
* 16:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
* 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:36 moritzm: restarting LDAP replicas for openssl update
* 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
* 16:35 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
* 16:34 ryankemper: [WDQS] Pooled `wdqs2001` (caught up on lag)
* 16:31 andrewbogott: sudo service networking restart on puppetmaster1003
* 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22793 and previous config saved to /var/cache/conftool/dbconfig/20220317-162723-marostegui.json
* 16:15 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22792 and previous config saved to /var/cache/conftool/dbconfig/20220317-161218-marostegui.json
* 16:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:10 XioNoX: pfw3-codfw move traffic to cr2 uplink
* 16:05 oblivian@puppetmaster1001: conftool action : edit; selector: name=random_q
* 16:04 ryankemper: [WDQS] Depooled `wdqs2001` (~4.85 hours of lag to catch up)
* 16:03 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
* 16:03 ryankemper: [WDQS] Pooled `wdqs2003` (caught up on lag)
* 16:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:00 moritzm: restarting apache on logstash*
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60980ce85c080fadaf0b2cb561be53f861ca94e0}}: ptwiki: Disable Growth image recommendation ([[phab:T302828|T302828]]) (duration: 00m 53s)
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22790 and previous config saved to /var/cache/conftool/dbconfig/20220317-155713-marostegui.json
* 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:46 XioNoX: cr1-codfw move xe-5/2/0 to xe-1/0/1:1 - [[phab:T289241|T289241]]
* 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:34 moritzm: restarting FPM on mw canaries
* 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
* 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
* 15:30 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
* 15:07 XioNoX: disable BGP to Telia in codfw for fiber move - [[phab:T289241|T289241]]
* 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
* 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22789 and previous config saved to /var/cache/conftool/dbconfig/20220317-145716-marostegui.json
* 14:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22788 and previous config saved to /var/cache/conftool/dbconfig/20220317-145708-marostegui.json
* 14:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22785 and previous config saved to /var/cache/conftool/dbconfig/20220317-144203-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22784 and previous config saved to /var/cache/conftool/dbconfig/20220317-142658-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22783 and previous config saved to /var/cache/conftool/dbconfig/20220317-141152-marostegui.json
* 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: [[phab:T303151|T303151]]
* 13:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 13:17 Lucas_WMDE: UTC afternoon backport window done
* 13:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771595{{!}}commonswiki: Add pictures.snsb.info to wgCopyUploadsDomains allowlist (T303929)]] (duration: 00m 50s)
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22782 and previous config saved to /var/cache/conftool/dbconfig/20220317-131227-marostegui.json
* 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22781 and previous config saved to /var/cache/conftool/dbconfig/20220317-131220-marostegui.json
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (3/3) (duration: 00m 49s)
* 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (2/3) (duration: 00m 49s)
* 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768089{{!}}Write "unexpectedUnconnectedPage" page prop on Beta]] – no expected behavior change in production (1/3) (duration: 00m 53s)
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22780 and previous config saved to /var/cache/conftool/dbconfig/20220317-125715-marostegui.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22779 and previous config saved to /var/cache/conftool/dbconfig/20220317-124209-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22778 and previous config saved to /var/cache/conftool/dbconfig/20220317-122704-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22777 and previous config saved to /var/cache/conftool/dbconfig/20220317-120700-root.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22776 and previous config saved to /var/cache/conftool/dbconfig/20220317-115156-root.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22775 and previous config saved to /var/cache/conftool/dbconfig/20220317-115012-root.json
* 11:42 volans: upgrades spicerack on cumin hosts to v2.3.3
* 11:41 volans: uploaded spicerack_2.3.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22774 and previous config saved to /var/cache/conftool/dbconfig/20220317-113652-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22773 and previous config saved to /var/cache/conftool/dbconfig/20220317-113508-root.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22772 and previous config saved to /var/cache/conftool/dbconfig/20220317-112921-marostegui.json
* 11:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 11:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22771 and previous config saved to /var/cache/conftool/dbconfig/20220317-112913-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22770 and previous config saved to /var/cache/conftool/dbconfig/20220317-112148-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22769 and previous config saved to /var/cache/conftool/dbconfig/20220317-112004-root.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22768 and previous config saved to /var/cache/conftool/dbconfig/20220317-111408-marostegui.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22767 and previous config saved to /var/cache/conftool/dbconfig/20220317-110645-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22766 and previous config saved to /var/cache/conftool/dbconfig/20220317-110536-marostegui.json
* 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22765 and previous config saved to /var/cache/conftool/dbconfig/20220317-105903-marostegui.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22764 and previous config saved to /var/cache/conftool/dbconfig/20220317-105349-marostegui.json
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-fe[1005-1008].eqiad.wmnet
* 10:47 marostegui: dbmaint on s3@eqiad [[phab:T298556|T298556]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22763 and previous config saved to /var/cache/conftool/dbconfig/20220317-104358-marostegui.json
* 10:40 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22762 and previous config saved to /var/cache/conftool/dbconfig/20220317-103844-marostegui.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22761 and previous config saved to /var/cache/conftool/dbconfig/20220317-103726-marostegui.json
* 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22760 and previous config saved to /var/cache/conftool/dbconfig/20220317-103719-marostegui.json
* 10:31 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:26 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-fe[1005-1008].eqiad.wmnet
* 10:24 marostegui: dbmaint on s3@codfw [[phab:T298556|T298556]]
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22759 and previous config saved to /var/cache/conftool/dbconfig/20220317-102214-marostegui.json
* 10:10 marostegui: dbmaint on s7@eqiad [[phab:T298556|T298556]]
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22758 and previous config saved to /var/cache/conftool/dbconfig/20220317-100709-marostegui.json
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22757 and previous config saved to /var/cache/conftool/dbconfig/20220317-095204-marostegui.json
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298556|T298556]])', diff saved to https://phabricator.wikimedia.org/P22756 and previous config saved to /var/cache/conftool/dbconfig/20220317-095044-marostegui.json
* 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22755 and previous config saved to /var/cache/conftool/dbconfig/20220317-094025-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22754 and previous config saved to /var/cache/conftool/dbconfig/20220317-094017-marostegui.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22752 and previous config saved to /var/cache/conftool/dbconfig/20220317-092512-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22751 and previous config saved to /var/cache/conftool/dbconfig/20220317-091911-marostegui.json
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22750 and previous config saved to /var/cache/conftool/dbconfig/20220317-091007-marostegui.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22749 and previous config saved to /var/cache/conftool/dbconfig/20220317-085502-marostegui.json
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clarakosi out of all services on: 1881 hosts
* 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Clarakosi out of all services on: 1881 hosts
* 08:24 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|0da40c22844746120de9b33e772598d38aa74326}}: throttle: Remove expired rules (duration: 00m 50s)
* 08:23 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|980ea35d454563e538d08b9d6462064455b4d28c}}: Throttle: Increase limit for English Wikipedia ([[phab:T304016|T304016]]) (duration: 00m 51s)
* 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ppchelko out of all services on: 1881 hosts
* 08:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ppchelko out of all services on: 1881 hosts
* 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Accraze out of all services on: 1881 hosts
* 08:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Accraze out of all services on: 1881 hosts
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22748 and previous config saved to /var/cache/conftool/dbconfig/20220317-080705-root.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22747 and previous config saved to /var/cache/conftool/dbconfig/20220317-075350-marostegui.json
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22746 and previous config saved to /var/cache/conftool/dbconfig/20220317-075201-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22745 and previous config saved to /var/cache/conftool/dbconfig/20220317-073658-root.json
* 07:31 marostegui: dbmaint on s5@eqiad [[phab:T297189|T297189]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22744 and previous config saved to /var/cache/conftool/dbconfig/20220317-072154-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json
* 07:11 ryankemper: [WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on)
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 06:57 ryankemper: [WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22
* 06:57 ryankemper: [WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently.
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json
* 06:54 ryankemper: [WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service`
* 06:53 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
* 06:50 elukey: restart blazegraph on wdqs2004
* 06:46 elukey: kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken)
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22739 and previous config saved to /var/cache/conftool/dbconfig/20220317-062648-root.json
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22738 and previous config saved to /var/cache/conftool/dbconfig/20220317-061144-root.json
* 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22737 and previous config saved to /var/cache/conftool/dbconfig/20220317-040634-marostegui.json
* 04:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 02:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 01:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
 
== 2022-03-16 ==
* 23:52 tzatziki: Removing  two files for legal compliance
* 21:17 cjming: end running skin update preference maintenance script
* 20:52 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [no-op] {{Gerrit|8efa537}}: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion ([[phab:T303240|T303240]]) (duration: 00m 53s)
* 20:38 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: {{Gerrit|9ba157b}}: Add insert option for update skin preferences script ([[phab:T299104|T299104]]) (duration: 00m 50s)
* 20:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: {{Gerrit|ebfc516}}: Add script to update vector skin preferences ([[phab:T299104|T299104]]) (duration: 00m 51s)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 20:13 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:13 urbanecm@deploy1002: Synchronized docroot/noc/db.php: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 3/3) (duration: 00m 49s)
* 20:12 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 2/3) (duration: 00m 50s)
* 20:11 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|f649199}}: Migrate wmfDatacenter(s) to wmgDatacenter(s) ([[phab:T45956|T45956]]; 1/3) (duration: 00m 50s)
* 19:22 otto@deploy1002: Finished deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided) (duration: 07m 50s)
* 19:14 otto@deploy1002: Started deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided)
* 18:32 sukhe: running: homer "cr*-drmrs*" commit "Gerrit 771359: Set up BGP peering in drmrs for Wikidough."
* 18:09 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f] (duration: 00m 08s)
* 18:09 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f]
* 18:02 aqu@deploy1002: Finished deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f] (duration: 00m 08s)
* 18:02 aqu@deploy1002: Started deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f]
* 18:00 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 18:00 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 17:36 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: [[gerrit:771001{{!}}mwscript: Support --force-version flag (T303878)]] (duration: 00m 57s)
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 17:13 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 07m 23s)
* 17:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 17:06 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 00m 07s)
* 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 16:48 aqu@deploy1002: Finished deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 25m 49s)
* 16:45 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:37 Emperor: rolling restart of ms-fe10[09-12] so they know about removal of older proxies [[phab:T303733|T303733]]
* 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 16:28 Emperor: moving swiftrepl and stats reporter host from ms-fe1005 to ms-fe1009 [[phab:T303733|T303733]]
* 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 16:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22734 and previous config saved to /var/cache/conftool/dbconfig/20220316-162721-marostegui.json
* 16:22 aqu@deploy1002: Started deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22733 and previous config saved to /var/cache/conftool/dbconfig/20220316-161216-marostegui.json
* 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 16:02 aqu: analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow"
* 15:59 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22732 and previous config saved to /var/cache/conftool/dbconfig/20220316-155711-marostegui.json
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22731 and previous config saved to /var/cache/conftool/dbconfig/20220316-155300-marostegui.json
* 15:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 15:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
* 15:51 moritzm: restarting exim/spamasassin on MXes to pick up new OpenSSL
* 15:49 urbanecm@deploy1002: Synchronized wmf-config/logos.php: cswiki celebration logo (duration: 00m 49s)
* 15:46 urbanecm@deploy1002: Synchronized static/images/project-logos/: cswiki celebration logos (duration: 00m 50s)
* 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:43 dancy@deploy1002: scap failed: RuntimeError dictionary changed size during iteration (duration: 25m 55s)
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22730 and previous config saved to /var/cache/conftool/dbconfig/20220316-154206-marostegui.json
* 15:38 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 15:37 ryankemper: [WCQS] Restarted updater across fleet to get out jvm sec upgrades: `ryankemper@cumin1001:~$ sudo -E cumin 'wcqs*' 'systemctl restart wcqs-updater.service'`
* 15:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 15:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 15:17 dancy@deploy1002: Started scap: testing mediawiki image build
* 15:15 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediaw
* 15:12 dancy@deploy1002: Started scap: (no justification provided)
* 15:11 dancy: Testing mediawiki image build on deploy server again
* 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 15:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 15:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22729 and previous config saved to /var/cache/conftool/dbconfig/20220316-150433-marostegui.json
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22728 and previous config saved to /var/cache/conftool/dbconfig/20220316-145946-marostegui.json
* 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:55 sukhe: rolling restart of nginx.service on durum* hosts for OpenSSL updates
* 14:55 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: [[gerrit:770937{{!}}Add script to update vector skin preferences (T299104)]] (duration: 00m 51s)
* 14:53 moritzm: restarting nginx/dhcpd on install/apt servers
* 14:53 sukhe: rolling restart of pdns-recursor.service and dnsdist.service on doh* hosts for OpenSSL updates
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22727 and previous config saved to /var/cache/conftool/dbconfig/20220316-144928-marostegui.json
* 14:47 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
* 14:35 XioNoX: add anycast6 peers in drmrs
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22726 and previous config saved to /var/cache/conftool/dbconfig/20220316-143423-marostegui.json
* 14:25 Emperor: depooling ms-fe100[5-8] prior to decommissioning [[phab:T303733|T303733]]
* 14:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22725 and previous config saved to /var/cache/conftool/dbconfig/20220316-141918-marostegui.json
* 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
* 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22724 and previous config saved to /var/cache/conftool/dbconfig/20220316-141708-marostegui.json
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:12 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php: Backport: [[gerrit:770942{{!}}Replace use of deprecated RecentChange::getEngine (T303861)]] (duration: 00m 51s)
* 14:10 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service [[phab:T303064|T303064]]
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22723 and previous config saved to /var/cache/conftool/dbconfig/20220316-140203-marostegui.json
* 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22722 and previous config saved to /var/cache/conftool/dbconfig/20220316-134658-marostegui.json
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22721 and previous config saved to /var/cache/conftool/dbconfig/20220316-133458-marostegui.json
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22720 and previous config saved to /var/cache/conftool/dbconfig/20220316-133153-marostegui.json
* 13:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
* 13:25 krinkle@deploy1002: Synchronized w/static.php: {{Gerrit|159dfd21d}} (duration: 00m 50s)
* 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS buster
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22718 and previous config saved to /var/cache/conftool/dbconfig/20220316-131953-marostegui.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22717 and previous config saved to /var/cache/conftool/dbconfig/20220316-131429-marostegui.json
* 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22716 and previous config saved to /var/cache/conftool/dbconfig/20220316-131421-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:07 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771331{{!}}Deploy template features to enwiki (T302857)]] (duration: 00m 50s)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22715 and previous config saved to /var/cache/conftool/dbconfig/20220316-130448-marostegui.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22714 and previous config saved to /var/cache/conftool/dbconfig/20220316-125916-marostegui.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22713 and previous config saved to /var/cache/conftool/dbconfig/20220316-125803-marostegui.json
* 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22712 and previous config saved to /var/cache/conftool/dbconfig/20220316-125755-marostegui.json
* 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22711 and previous config saved to /var/cache/conftool/dbconfig/20220316-124943-marostegui.json
* 12:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22710 and previous config saved to /var/cache/conftool/dbconfig/20220316-124742-marostegui.json
* 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22709 and previous config saved to /var/cache/conftool/dbconfig/20220316-124734-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22708 and previous config saved to /var/cache/conftool/dbconfig/20220316-124411-marostegui.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22707 and previous config saved to /var/cache/conftool/dbconfig/20220316-124250-marostegui.json
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22705 and previous config saved to /var/cache/conftool/dbconfig/20220316-123229-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22704 and previous config saved to /var/cache/conftool/dbconfig/20220316-122906-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22703 and previous config saved to /var/cache/conftool/dbconfig/20220316-122745-marostegui.json
* 12:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22702 and previous config saved to /var/cache/conftool/dbconfig/20220316-121724-marostegui.json
* 12:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22701 and previous config saved to /var/cache/conftool/dbconfig/20220316-121240-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22700 and previous config saved to /var/cache/conftool/dbconfig/20220316-120219-marostegui.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22699 and previous config saved to /var/cache/conftool/dbconfig/20220316-120100-marostegui.json
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22698 and previous config saved to /var/cache/conftool/dbconfig/20220316-120047-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22697 and previous config saved to /var/cache/conftool/dbconfig/20220316-114542-marostegui.json
* 11:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22695 and previous config saved to /var/cache/conftool/dbconfig/20220316-113200-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22694 and previous config saved to /var/cache/conftool/dbconfig/20220316-113152-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22693 and previous config saved to /var/cache/conftool/dbconfig/20220316-113057-marostegui.json
* 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22692 and previous config saved to /var/cache/conftool/dbconfig/20220316-113037-marostegui.json
* 11:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22691 and previous config saved to /var/cache/conftool/dbconfig/20220316-111647-marostegui.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22690 and previous config saved to /var/cache/conftool/dbconfig/20220316-111532-marostegui.json
* 11:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22689 and previous config saved to /var/cache/conftool/dbconfig/20220316-110411-marostegui.json
* 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22688 and previous config saved to /var/cache/conftool/dbconfig/20220316-110403-marostegui.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22687 and previous config saved to /var/cache/conftool/dbconfig/20220316-110142-marostegui.json
* 10:55 vgutierrez: rolling upgrade to HAProxy 2.4.15 on cache nodes
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22686 and previous config saved to /var/cache/conftool/dbconfig/20220316-104858-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22685 and previous config saved to /var/cache/conftool/dbconfig/20220316-104637-marostegui.json
* 10:42 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22684 and previous config saved to /var/cache/conftool/dbconfig/20220316-103353-marostegui.json
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22683 and previous config saved to /var/cache/conftool/dbconfig/20220316-101848-marostegui.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22682 and previous config saved to /var/cache/conftool/dbconfig/20220316-101729-marostegui.json
* 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 10 hosts with reason: Maintenance
* 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 10 hosts with reason: Maintenance
* 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:15 vgutierrez: rolling restart of ats-tls and ats-backend to catch up on OpenSSL updates
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22681 and previous config saved to /var/cache/conftool/dbconfig/20220316-101502-marostegui.json
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22680 and previous config saved to /var/cache/conftool/dbconfig/20220316-100527-marostegui.json
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22679 and previous config saved to /var/cache/conftool/dbconfig/20220316-100519-marostegui.json
* 10:04 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
* 10:01 moritzm: installing openssl security updates
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22678 and previous config saved to /var/cache/conftool/dbconfig/20220316-095957-marostegui.json
* 09:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
* 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS stretch
* 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22677 and previous config saved to /var/cache/conftool/dbconfig/20220316-095014-marostegui.json
* 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22676 and previous config saved to /var/cache/conftool/dbconfig/20220316-094452-marostegui.json
* 09:36 dcausse: [[phab:T293862|T293862]]: manually restarted blazegraph on wdqs1010 with "-agentpath:/usr/lib/libjvmquake.so=1000,1,0,warn=30,touch=/tmp/jvmquake"
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22675 and previous config saved to /var/cache/conftool/dbconfig/20220316-093509-marostegui.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22674 and previous config saved to /var/cache/conftool/dbconfig/20220316-092947-marostegui.json
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22673 and previous config saved to /var/cache/conftool/dbconfig/20220316-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22672 and previous config saved to /var/cache/conftool/dbconfig/20220316-092735-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22671 and previous config saved to /var/cache/conftool/dbconfig/20220316-092004-marostegui.json
* 09:16 moritzm: revert mx1001/mx2001 to the Bullseye version of Exim [[phab:T303738|T303738]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 [[phab:T303498|T303498]]', diff saved to https://phabricator.wikimedia.org/P22670 and previous config saved to /var/cache/conftool/dbconfig/20220316-091533-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22669 and previous config saved to /var/cache/conftool/dbconfig/20220316-091229-marostegui.json
* 09:09 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 08:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22668 and previous config saved to /var/cache/conftool/dbconfig/20220316-085724-marostegui.json
* 08:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:52 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22667 and previous config saved to /var/cache/conftool/dbconfig/20220316-084219-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22666 and previous config saved to /var/cache/conftool/dbconfig/20220316-084140-marostegui.json
* 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22665 and previous config saved to /var/cache/conftool/dbconfig/20220316-084127-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22664 and previous config saved to /var/cache/conftool/dbconfig/20220316-084011-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22663 and previous config saved to /var/cache/conftool/dbconfig/20220316-084003-marostegui.json
* 08:35 hashar: Restarting CI Jenkins
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22662 and previous config saved to /var/cache/conftool/dbconfig/20220316-082622-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22661 and previous config saved to /var/cache/conftool/dbconfig/20220316-082458-marostegui.json
* 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:770130{{!}}Change A/V player to videojs in the first batch of production wiki (T248418)]] (duration: 00m 49s)
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22660 and previous config saved to /var/cache/conftool/dbconfig/20220316-081117-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22659 and previous config saved to /var/cache/conftool/dbconfig/20220316-080953-marostegui.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22658 and previous config saved to /var/cache/conftool/dbconfig/20220316-075612-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22657 and previous config saved to /var/cache/conftool/dbconfig/20220316-075502-marostegui.json
* 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22656 and previous config saved to /var/cache/conftool/dbconfig/20220316-075448-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22655 and previous config saved to /var/cache/conftool/dbconfig/20220316-075248-marostegui.json
* 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22654 and previous config saved to /var/cache/conftool/dbconfig/20220316-075007-marostegui.json
* 07:49 Amir1: dbmaint on master of s4@eqiad ([[phab:T298743|T298743]])
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22653 and previous config saved to /var/cache/conftool/dbconfig/20220316-073502-marostegui.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22652 and previous config saved to /var/cache/conftool/dbconfig/20220316-071957-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22651 and previous config saved to /var/cache/conftool/dbconfig/20220316-071859-marostegui.json
* 07:18 urbanecm: UTC morning B&C window done
* 07:15 urbanecm: Create `testwiki.cx_significant_edits` and `testwiki.cx_section_translation` at s3 ([[phab:T302371|T302371]]; `mwscript sql.php --wiki=testwiki /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/<nowiki>{</nowiki>section-translations,significant-edits<nowiki>}</nowiki>.sql)`)
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|455895168ab266813ae499e8fc353c66e6d5b450}}: Disable ContentTranslation for non-extended confirmed users on viwiki ([[phab:T299636|T299636]]) (duration: 00m 51s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22650 and previous config saved to /var/cache/conftool/dbconfig/20220316-070452-marostegui.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22649 and previous config saved to /var/cache/conftool/dbconfig/20220316-070354-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P22648 and previous config saved to /var/cache/conftool/dbconfig/20220316-070033-marostegui.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22647 and previous config saved to /var/cache/conftool/dbconfig/20220316-065918-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22646 and previous config saved to /var/cache/conftool/dbconfig/20220316-064849-marostegui.json
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22644 and previous config saved to /var/cache/conftool/dbconfig/20220316-063344-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22643 and previous config saved to /var/cache/conftool/dbconfig/20220316-060008-marostegui.json
* 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22642 and previous config saved to /var/cache/conftool/dbconfig/20220316-055903-marostegui.json
* 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 05:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22641 and previous config saved to /var/cache/conftool/dbconfig/20220316-055805-marostegui.json
* 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 05:36 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 05:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1068.eqiad.wmnet with OS stretch
* 05:14 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org ; WCQS deploy complete
* 05:13 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS (duration: 01m 53s)
* 05:12 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.106` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 05:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS
* 05:11 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611]: 0.3.106 (duration: 06m 36s)
* 05:03 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.106` on canary `wdqs1003`; proceeding to rest of fleet
* 05:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611]: 0.3.106
* 05:01 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.106`. Pre-deploy tests passing on canary `wdqs1003`
* 02:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22640 and previous config saved to /var/cache/conftool/dbconfig/20220316-025347-marostegui.json
* 02:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22639 and previous config saved to /var/cache/conftool/dbconfig/20220316-023842-marostegui.json
* 02:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22638 and previous config saved to /var/cache/conftool/dbconfig/20220316-022336-marostegui.json
* 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22637 and previous config saved to /var/cache/conftool/dbconfig/20220316-020831-marostegui.json
* 01:43 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 01:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 01:37 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 01:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
* 00:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 00:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
* 00:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
 
== 2022-03-15 ==
* 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 22:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:05 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 22:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 22:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 22:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 22:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 21:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 21:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 21:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22635 and previous config saved to /var/cache/conftool/dbconfig/20220315-214729-marostegui.json
* 21:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22634 and previous config saved to /var/cache/conftool/dbconfig/20220315-214721-marostegui.json
* 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22633 and previous config saved to /var/cache/conftool/dbconfig/20220315-214133-ladsgroup.json
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebu