You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(AaronSchulz: Running importMissingLocalNames.php on mwmaint1002 in a screen)
imported>Stashbot
(robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0))
 
(502 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-04-05 ==
== 2022-10-03 ==
* 23:17 AaronSchulz: Running importMissingLocalNames.php on mwmaint1002 in a screen
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:58 sbassett: re-deploy security patch for [[phab:T270453|T270453]] to wmf.37
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 20:50 sbassett: re-deploy security patch for [[phab:T270988|T270988]] to wmf.37
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 20:43 mholloway-shell@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for android.image_recommendation_interaction (duration: 00m 59s)
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby (duration: 01m 41s)
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 19:30 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 19:08 andrew@deploy1002: Finished deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon (duration: 02m 04s)
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 19:06 andrew@deploy1002: Started deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 18:28 tgr_: Morning deploys done
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
* 18:28 tgr@deploy1002: Synchronized dblists/growthexperiments.dblist: Config: [[gerrit:676654{{!}}Fix growthexperiments.dblist (T275171)]] (duration: 00m 58s)
* 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 18:27 tgr@deploy1002: Synchronized wmf-config/config/frwiki.yaml: Config: [[gerrit:676654{{!}}Fix growthexperiments.dblist (T275171)]] (duration: 00m 59s)
* 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 17:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 17:05 dpifke@deploy1002: Finished deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006 (duration: 00m 05s)
* 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:05 dpifke@deploy1002: Started deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006
* 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
* 16:45 Urbanecm: Start server-side upload of 4 video files ([[phab:T279204|T279204]], [[phab:T279201|T279201]], [[phab:T279200|T279200]], [[phab:T279198|T279198]])
* 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 14:43 XioNoX: push pfw policies - [[phab:T278970|T278970]]
* 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15149 and previous config saved to /var/cache/conftool/dbconfig/20210405-140825-root.json
* 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15148 and previous config saved to /var/cache/conftool/dbconfig/20210405-140751-root.json
* 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15147 and previous config saved to /var/cache/conftool/dbconfig/20210405-135321-root.json
* 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15146 and previous config saved to /var/cache/conftool/dbconfig/20210405-135248-root.json
* 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15145 and previous config saved to /var/cache/conftool/dbconfig/20210405-133818-root.json
* 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15144 and previous config saved to /var/cache/conftool/dbconfig/20210405-133744-root.json
* 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
* 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15143 and previous config saved to /var/cache/conftool/dbconfig/20210405-132314-root.json
* 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15142 and previous config saved to /var/cache/conftool/dbconfig/20210405-132240-root.json
* 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for upgrade', diff saved to https://phabricator.wikimedia.org/P15141 and previous config saved to /var/cache/conftool/dbconfig/20210405-131221-marostegui.json
* 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P15140 and previous config saved to /var/cache/conftool/dbconfig/20210405-124118-marostegui.json
* 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15139 and previous config saved to /var/cache/conftool/dbconfig/20210405-123751-root.json
* 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15138 and previous config saved to /var/cache/conftool/dbconfig/20210405-122247-root.json
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15137 and previous config saved to /var/cache/conftool/dbconfig/20210405-120744-root.json
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scb[1001-1004].eqiad.wmnet
* 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
* 12:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts scb[2001-2006].codfw.wmnet
* 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15136 and previous config saved to /var/cache/conftool/dbconfig/20210405-115240-root.json
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[1001-1004].eqiad.wmnet
* 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:09 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
* 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts scb[2001-2006].codfw.wmnet
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
* 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P15135 and previous config saved to /var/cache/conftool/dbconfig/20210405-110506-marostegui.json
* 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15134 and previous config saved to /var/cache/conftool/dbconfig/20210405-105731-root.json
* 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15133 and previous config saved to /var/cache/conftool/dbconfig/20210405-105715-root.json
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15132 and previous config saved to /var/cache/conftool/dbconfig/20210405-104227-root.json
* 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15131 and previous config saved to /var/cache/conftool/dbconfig/20210405-104211-root.json
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P15130 and previous config saved to /var/cache/conftool/dbconfig/20210405-104010-marostegui.json
* 16:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] (duration: 04m 16s)
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P15129 and previous config saved to /var/cache/conftool/dbconfig/20210405-103318-marostegui.json
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15128 and previous config saved to /var/cache/conftool/dbconfig/20210405-103301-root.json
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15127 and previous config saved to /var/cache/conftool/dbconfig/20210405-102724-root.json
* 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15126 and previous config saved to /var/cache/conftool/dbconfig/20210405-102708-root.json
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15125 and previous config saved to /var/cache/conftool/dbconfig/20210405-101757-root.json
* 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15124 and previous config saved to /var/cache/conftool/dbconfig/20210405-101213-root.json
* 16:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]]
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15123 and previous config saved to /var/cache/conftool/dbconfig/20210405-101204-root.json
* 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cae49b85d2d780e34b553789d56d76bac4a62c48}}: throttle: Add throttle rule for 2022-10-06 ([[phab:T319212|T319212]]) (duration: 04m 21s)
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15122 and previous config saved to /var/cache/conftool/dbconfig/20210405-100253-root.json
* 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out [[phab:T309651|T309651]]
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P15121 and previous config saved to /var/cache/conftool/dbconfig/20210405-100246-marostegui.json
* 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out [[phab:T309651|T309651]]
* 09:50 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15120 and previous config saved to /var/cache/conftool/dbconfig/20210405-094744-root.json
* 15:06 papaul: maintenance complete on mr1-esams
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15119 and previous config saved to /var/cache/conftool/dbconfig/20210405-091043-marostegui.json
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15118 and previous config saved to /var/cache/conftool/dbconfig/20210405-082521-marostegui.json
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15117 and previous config saved to /var/cache/conftool/dbconfig/20210405-080523-root.json
* 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15116 and previous config saved to /var/cache/conftool/dbconfig/20210405-075019-root.json
* 14:31 papaul: on going maintenance on mr1-esams
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15115 and previous config saved to /var/cache/conftool/dbconfig/20210405-073515-root.json
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15114 and previous config saved to /var/cache/conftool/dbconfig/20210405-072012-root.json
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15113 and previous config saved to /var/cache/conftool/dbconfig/20210405-064727-marostegui.json
* 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15112 and previous config saved to /var/cache/conftool/dbconfig/20210405-054951-marostegui.json
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
* 05:30 marostegui: Deploy schema change on db1121, lag will appear on s4 on wikireplicas
* 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: [[phab:T309651|T309651]]
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P15111 and previous config saved to /var/cache/conftool/dbconfig/20210405-053000-marostegui.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
* 05:12 marostegui: Restart all sanitarium hosts to pick up new filters [[phab:T278573|T278573]]
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
* 13:18 vgutierrez: enforcing origin-form{{!}}asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - [[phab:T318676|T318676]]
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
* 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
* 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
* 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
* 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
* 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
* 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
* 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
* 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
* 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
* 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
* 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
* 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
* 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
* 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
* 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
* 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
* 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
* 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
* 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
* 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
* 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
* 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
* 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
* 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
* 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
* 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
* 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
* 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
* 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
* 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
* 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
* 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
* 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json


== 2021-04-04 ==
== 2022-10-02 ==
* 14:47 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s)
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition
* 14:45 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch


== 2021-04-03 ==
== 2022-10-01 ==
* 19:20 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 11s)
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 19:18 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 17:30 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 35s)
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 17:26 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)
* 16:44 elukey: power reset for ms-be2028 - not reachable via ssh, no tty available via mgmt console, NMI unrecoverable errors logged in iLo's system logs
* 15:35 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 18s)
* 15:33 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 15:12 andrew@deploy1002: Finished deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch (duration: 11m 51s)
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch
* 05:38 andrew@deploy1002: Finished deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 05s)
* 05:35 andrew@deploy1002: Started deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch


== 2021-04-02 ==
== 2022-09-30 ==
* 22:31 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:31 bstorm@cumin1001: Added views for new wiki: trvwiki [[phab:T276246|T276246]]
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:08 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 22:08 mutante: pooled mw2395,mw2396 as API appservers running on new hardware
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[5-6].codfw.wmnet
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 21:58 legoktm: legoktm@lists1002:~$ time sudo mailman-web rebuild_index
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 21:56 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[5-6].codfw.wmnet
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[5-6].codfw.wmnet
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 21:48 mutante: mw2395, mw2396 - reboot - becoming API servers
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[0-4].codfw.wmnet
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 21:42 mutante: pooled 12 brand-new codfw appservers running on new hardware generation
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw238[5-9].codfw.wmnet
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2384.codfw.wmnet
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2383.codfw.wmnet
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 21:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 21:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[0-4].codfw.wmnet
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 21:34 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw238[3-9].codfw.wmnet
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 21:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 21:28 legoktm: imported python-xapian-haystack 2.1.0-6~wmf1 on apt1001 ([[phab:T278717|T278717]])
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2393.codfw.wmnet
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2392.codfw.wmnet
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2391.codfw.wmnet
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2390.codfw.wmnet
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2389.codfw.wmnet
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2388.codfw.wmnet
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2387.codfw.wmnet
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2386.codfw.wmnet
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2385.codfw.wmnet
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2384.codfw.wmnet
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2383.codfw.wmnet
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 21:19 mutante: generating mcrouter certs for mw2395 through mw2404  ([[phab:T278396|T278396]])
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 21:07 mutante: mw2383 through mw2394 - 'uptime && scap pull' via ssh -C (not cumin because it needs to run as non-root)
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 20:58 mutante: mw238* - scap pull via cumin not possible because it doesnt work as root
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 20:50 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: tweak to affinity group options (duration: 03m 39s)
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 20:46 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: tweak to affinity group options
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 20:44 mutante: mw2385 through mw2394 - serial rebooting
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 20:43 mutante: mw2384 reboot
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: new_install
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: new_install
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev (duration: 01m 47s)
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 20:39 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 20:09 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 20:09 bstorm@cumin1001: Added views for new wiki: taywiki [[phab:T275836|T275836]]
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 19:47 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 19:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 19:07 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 19:07 bstorm@cumin1001: Added views for new wiki: mnwwiktionary [[phab:T276126|T276126]]
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:44 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 18:44 mutante: [puppetmaster1001:~] $ sudo puppet node deactivate mw2247.codfw.wmnet
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 18:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 17:57 legoktm: upgraded mailman3 python3-django-postorius on lists1002
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:41 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 14:35 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw133[7-8].eqiad.wmnet
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 14:34 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=videoscaler,name=mw133[5-6].eqiad.wmnet
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 14:32 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw133[5-6].eqiad.wmnet
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 14:31 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw133[7-8].eqiad.wmnet
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 14:29 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1111.eqiad.wmnet
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 14:20 Urbanecm: Start server-side upload for 3 video files ([[phab:T279060|T279060]], [[phab:T279061|T279061]], [[phab:T279062|T279062]])
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 14:09 Urbanecm: Start server-side upload for 3 video files ([[phab:T279138|T279138]], [[phab:T279137|T279137]], [[phab:T279136|T279136]])
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 13:42 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.37
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 13:11 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/load.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 13:10 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: [[phab:T278579|T278579]] (duration: 00m 57s)
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 13:08 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 11:46 Urbanecm: correction: Start server-side upload for 3 video files ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 11:45 Urbanecm: Start server-side upload for 3 images ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 10:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 10:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback group0 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 09:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 and group2 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 09:44 hashar@deploy1002: sync-wikiversions aborted: Revert group1 and group2 wikis to 1.36.0-wmf.36 (duration: 00m 01s)
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 09:06 dcausse: remove dumps from wdqs1009 to free disk space
* 07:33 effie: powercycle an-worker1080
* 07:28 elukey: manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b
* 03:54 dwisehaupt: replication user on fundraising db set to require ssl for connections at the mysql user level. db updated on frdb1004 and verified on a set of hosts
* 03:16 dwisehaupt: replication user on payments db set to require ssl for connections at the mysql user level. db updated on payments1001 and verified on a set of hosts


== 2021-04-01 ==
== 2022-09-29 ==
* 23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:676350{{!}}Revert "Turn on glent m1 AB test"]] [[phab:T262612|T262612]] (duration: 00m 58s)
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to {{Gerrit|1be781d}} (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III [[gerrit:676451
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance


== 2021-03-31 ==
== 2022-09-28 ==
* 23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|bfc8f55196f57e43c0abc8a16d81cb3b390ac94a}}: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|ad564a098f9174d76ff5c95adec20064ddde7bc9}}: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:674698{{!}}Include private folder in restricted image (T276145)]] (duration: 01m 08s)
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:668241{{!}}Use the new mediawiki logos]], part II ([[phab:T268230|T268230]]) (duration: 01m 11s)
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 23:03 ladsgroup@deploy1002: Synchronized static: [[gerrit:668241{{!}}Use the new mediawiki logos]], part I ([[phab:T268230|T268230]]) (duration: 01m 09s)
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:58 Urbanecm: Start server side upload for 3 files
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:01 Urbanecm: Server side upload of three video files ([[phab:T279011|T279011]], [[phab:T278956|T278956]], [[phab:T278955|T278955]])
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 22:01 eileen: civicrm revision changed from {{Gerrit|2fcea570bd}} to {{Gerrit|740e49d868}}, config revision is {{Gerrit|6779e3829a}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 01m 08s)
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
* 19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
* 19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs  [[phab:T278376|T278376]] [[phab:T278343|T278343]] (duration: 00m 58s)
* 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
* 17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36  refs [[phab:T278343|T278343]]
* 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
* 17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
* 17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs [[phab:T278343|T278343]]
* 20:39 TheresNoTime: closing UTC late backport window
* 17:01 Urbanecm: Server side upload of three video files ([[phab:T278959|T278959]], [[phab:T278958|T278958]], [[phab:T278957|T278957]])
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:57 papaul: disconnecting ps1-d8-codfw for replacement
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] (duration: 06m 19s)
* 14:02 Urbanecm: Server side upload of two video files ([[phab:T278961|T278961]], [[phab:T278960|T278960]])
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:48 jynus: retrying s3 snapshot on codfw
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for [[phab:T278220|T278220]]
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing [[phab:T278220|T278220]]
* 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
* 20:18 samtar@deploy1002: Started scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]]
* 12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
* 20:11 samtar@deploy1002: Sync cancelled.
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 11:38 awight: EU deployment complete
* 20:04 samtar@deploy1002: samtar and dani: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: [[gerrit:675882{{!}}Style change to mediasearch logged-in notice close (T274927)]] [[gerrit:675883{{!}}Suppress user notice on mobile (T274927)]] [[gerrit:675881{{!}}Reset namespace filter on cancel (T276261)]] (duration: 01m 08s)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]]
* 11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:675509{{!}}vector: Disable WVUI search widget treatment A/B test (T276917)]] (duration: 01m 08s)
* 19:24 ejegg: updated fundraising CiviCRM from {{Gerrit|916a8b08}} to {{Gerrit|d31c19a0}}
* 10:48 effie: enable puppet on all mw* servers
* 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:10 effie: disable puppet on all mw* hosts
* 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:03 hashar: contint2001: enable puppet again
* 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
* 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
* 04:35 eileen: civicrm revision changed from {{Gerrit|7040b68c11}} to {{Gerrit|2fcea570bd}}, config revision is {{Gerrit|6779e3829a}}
* 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
* 02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
* 02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 03m 38s)
* 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
* 01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
* 01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
* 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
* 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
* 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
* 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
* 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
* 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
* 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
* 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
* 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
* 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
* 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
* 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
* 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
* 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
* 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
* 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
* 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
* 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
* 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
* 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
* 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
* 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
* 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
* 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:09 moritzm: installing twisted security updates
* 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
* 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
* 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
* 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
* 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
* 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
* 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
* 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
* 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
* 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
* 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
* 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
* 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
* 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
* 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
* 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
* 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
* 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
* 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
* 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
* 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
* 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
* 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
* 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
* 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
* 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
* 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
* 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
* 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
* 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
* 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
* 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
* 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
* 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
* 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
* 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
* 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
* 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
* 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
* 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
* 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
* 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
* 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
* 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
* 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
* 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
* 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
* 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
* 12:21 XioNoX: re-enable Init7 in knams
* 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
* 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
* 11:18 moritzm: installing expat security updates
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
* 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
* 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
* 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
* 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
* 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
* 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
* 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
* 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:30 XioNoX: disable BGP to init7 in knams
* 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] (duration: 05m 17s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:03 kartik@deploy1002: Started scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]]
* 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
* 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
* 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
* 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}


== 2021-03-30 ==
== 2022-09-27 ==
* 23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings ([[phab:T278609|T278609]])
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef306a35464f295f43b874301cf0170edcfa4d8c}}: Growth features: bnwiki: Enable impact module ([[phab:T274793|T274793]]) (duration: 01m 07s)
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 22:52 cstone: civicrm revision changed from {{Gerrit|ad430721f6}} to {{Gerrit|7040b68c11}}
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 21:02 legoktm: scap pulling on mw1298
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:58 legoktm: killed remaining ffmpeg on mw1298
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 21:12 TheresNoTime: closing UTC late backport window
* 20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:59 TheresNoTime: extending UTC late backport window
* 20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 80m 32s)
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs [[phab:T278343|T278343]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 19:58 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:19 legoktm: killed all ffmpeg on mw1294
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 16:40 effie: enable puppet on mw* hosts
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 16:10 mutante: mw1296 - started ferm
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:10 mutante: mw1308 - started ferm
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:07 mutante: mw1309 - systemctl start ferm
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 15:59 akosiaris: depool a number of hosts from videoscalers
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 14:59 effie: disable puppet on mediawiki servers to deploy 663565
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki ([[phab:T278350|T278350]])
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 14:32 arturo: manually start update-openstack-mirror.service on sodium ([[phab:T278505|T278505]])
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 13:02 jbond42: rollout lxml update [[phab:T278822|T278822]]
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 12:55 jbond42: update spamassasin on lists,otrs and mx [[phab:T278820|T278820]]
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait ([[phab:T277060|T277060]])
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 12:38 jbond42: update python(3)-pygments
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675751{{!}}Disable legacy javascript global variables in group1]], Some increase in client errors is expected ([[phab:T72470|T72470]]) (duration: 01m 11s)
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 [[phab:T250666|T250666]]
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 08:05 dcausse: refreshing wdqs entities ([[phab:T278693|T278693]])
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - [[phab:T278734|T278734]]
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2021-03-29 ==
== 2022-09-26 ==
* 19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:16 ryankemper: [[phab:T267927|T267927]] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id [[phab:T267927|T267927]] --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 20:31 TheresNoTime: closing UTC late backport window
* 08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}} (duration: 00m 08s)
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}}
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - [[phab:T278478|T278478]] (duration: 01m 08s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: [[gerrit:675161{{!}}Wrap most of functionalities depending on protect mode in a condition]] ([[phab:T278478|T278478]]) (duration: 01m 08s)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-27 ==
== 2022-09-25 ==
* 19:25 elukey: powercycle elastic1060 - [[phab:T278630|T278630]]
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 06:10 ryankemper: [[phab:T267927|T267927]] `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye


== 2021-03-26 ==
== 2022-09-23 ==
* 22:27 tzatziki: reset password for Philroc
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 12:56 moritzm: drain ganeti1014
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 12:37 moritzm: drain ganeti1013
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 09:28 moritzm: drain ganeti1012
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 08:38 moritzm: drain ganeti1010
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`


== 2021-03-25 ==
== 2022-09-22 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:55 brennen: end of utc late backport & config window
* 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 08s)
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 06s)
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished ([[phab:T275337|T275337]])
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # [[phab:T278391|T278391]]
* 20:36 brennen@deploy1002: backport aborted: (duration: 02m 16s)
* 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # [[phab:T275337|T275337]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|39cd4f15a3900783ac0e9a213004a28f18298a23}}: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 09s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcfb7feaace1f397169e5e1bab7efd4e5f605a0f}}: ruwiki: flaggedrevs: Do not remove autoreview group ([[phab:T275337|T275337]]) (duration: 01m 14s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|3fb664682bea3c4d1448b0937f938e810268bac3}}: ruwiki: flaggedrevs: Revoke review from sysop group ([[phab:T275811|T275811]]) (duration: 01m 06s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (3/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (2/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (1/3; [[phab:T275819|T275819]]) (duration: 01m 10s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62be4e738a4fd45256027bb09b010ab152f19850}}: Disable magic links on enwiki ([[phab:T275951|T275951]]) (duration: 01m 20s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 18:09 marxarelli: scap sync-file .pipeline Config: [[gerrit:674132{{!}}Include patches in restricted image (T271274)]]
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:06 hnowlan: draining and restarting aqs1004-b cassandra
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 17:45 hnowlan: draining and restarting aqs1004-a cassandra
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 16:32 moritzm: restarting apache on an-tool1007/turnilo
* 18:38 dancy@deploy1002: Started scap: testing
* 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 16:24 jbond42: restart slapd on ldap-replica
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 16:22 jbond42: restart slapd on ldap-corp
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 16:20 jbond42: restart apache on lists1002
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 16:18 jbond42: restart apache on netbox
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - [[phab:T278400|T278400]] (duration: 01m 32s)
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 16:12 jbond42: restart routinator on rpki*
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 16:12 moritzm: restarting nginx on apt*
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 16:10 moritzm: restarting apache on dbmonitor
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:08 moritzm: restart Apacge on matomo/piwik
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:03 jbond42: restart apache service on gerrit
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:02 jbond42: restart idp service
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:01 ema: A:cp rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki>-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:45 moritzm: installing openssl updates on buster
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
* 16:39 dancy@deploy1002: Sync cancelled.
* 13:45 moritzm: drain ganeti1009
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:14 moritzm: drain ganeti1008
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674861{{!}}Disable Legacy javascript in fawikiquote]] ([[phab:T72470|T72470]]) (duration: 01m 07s)
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 11:46 moritzm: drain ganeti1007
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: [[gerrit:674382{{!}}Inform anonymous A/B test by tracking time from navigationStart (T275807)]] (duration: 01m 09s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 11:33 ladsgroup@deploy1002: Synchronized dblists/: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]], Part II ([[phab:T278369|T278369]]) (duration: 01m 07s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:32 ladsgroup@deploy1002: Synchronized wmf-config: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]] ([[phab:T278369|T278369]]) (duration: 01m 30s)
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:10 moritzm: drain ganeti1006
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
* 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
* 10:34 moritzm: drain ganeti1005
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 09:26 moritzm: drain ganeti2024
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 08:45 moritzm: drain ganeti2023
* 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
* 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
* 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 ([[phab:T276687|T276687]])
* 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia ([[phab:T276687|T276687]])
* 07:35 jynus: restart db2135 [[phab:T278408|T278408]] [[phab:T273281|T273281]]
* 07:05 effie: enable puppet on all mediawiki servers
* 06:57 XioNoX: Option 82: use-vlan-id
* 06:53 effie: enable puppet on jobrunners
* 06:47 effie: enable puppet on parsoid
* 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
* 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
* 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
* 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
* 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
* 00:34 mutante: mw2377, mw2378 - first scap pull
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
* 00:29 legoktm: syncing facts for puppet-compiler
* 00:23 mutante: mw2377, mw2378 - reboot
* 00:14 twentyafterfour: phabricator update complete
* 00:10 twentyafterfour: deploying phabricator
* 00:05 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`


== 2021-03-24 ==
== 2022-09-21 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:46 tgr_: UTC late deploys done
* 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:19 mutante: webperf2001 - restarted apache
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - [[phab:T277865|T277865]] (duration: 01m 07s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - [[phab:T278379|T278379]] (duration: 01m 07s)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - [[phab:T278375|T278375]] (duration: 01m 07s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:57 ryankemper: [[phab:T267927|T267927]] Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:42 ryankemper: [[phab:T267927|T267927]] Re-enabledpuppet on `wdqs2008` and ran puppet agent
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 3/3) (duration: 01m 08s)
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 2/3) (duration: 01m 06s)
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 1/3) (duration: 01m 07s)
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 3/3) (duration: 01m 06s)
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 2/3) (duration: 01m 07s)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:52 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 1/3) (duration: 01m 08s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5aa050602954a3cab0c7e0c4b10efb0f957efb59}}: Promote several Growth target wikis out of dark mode ([[phab:T277491|T277491]]; [[phab:T276830|T276830]]; [[phab:T276123|T276123]]; [[phab:T276816|T276816]]; [[phab:T275550|T275550]]; [[phab:T276450|T276450]]) (duration: 01m 08s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|333393dfe59deb0ec4d7df6dd92372a705f65b85}}: Add autopatrol to autoreviewers in en.wikibooks ([[phab:T278300|T278300]]) (duration: 01m 09s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 17:25 effie: upgrade memcached on mc-gp* hosts
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G [[phab:T224579|T224579]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:35 effie: enable puppet on all mediawiki + memcached hosts
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:20 moritzm: drain ganeti2022
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 moritzm: drain ganeti2021
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 moritzm: installing irc1001
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 moritzm: drain ganeti2020
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:28 effie: enabling puppet on mediawiki and memcached servers
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 12:10 jynus: restart dbprov200[12] [[phab:T271913|T271913]]
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 Andrew-WMDE_: EU deploys done
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:53 jynus: restart dbprov100[12] [[phab:T271913|T271913]]
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: [[gerrit:674367{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 08s)
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: [[gerrit:674366{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 07s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673326{{!}}Enable CodeMirror accessibility colors on initial wikis (T276346)]] (duration: 01m 08s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:15 jynus: restart serially db2097 db2098 db2099 db2100 [[phab:T271913|T271913]]
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673312{{!}}Enable bracket matching on group0 and wikitech (T273591)]] (duration: 01m 25s)
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
* 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
* 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
* 10:31 jynus: restart db1171 [[phab:T271913|T271913]]
* 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:14 jynus: restart db1145 [[phab:T271913|T271913]]
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:03 jynus: restart db1139 [[phab:T271913|T271913]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
* 09:51 jynus: restart db1116 [[phab:T271913|T271913]]
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
* 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 08:16 gehel: restarting wdqs updater on all nodes for config change
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
* 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
* 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
* 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
* 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
* 07:09 moritzm: installing squid security updates
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
* 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
* 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
* 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:41 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 03:41 ryankemper: [[phab:T274204|T274204]] Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
* 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 03:39 ryankemper: [[phab:T274204|T274204]] Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 02:38 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:59 ryankemper: [[phab:T274204|T274204]] For now I'll proceed to the reboots of `codfw`
* 01:59 ryankemper: [[phab:T274204|T274204]] `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
* 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
* 01:49 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
* 01:36 eileen: civicrm revision changed from {{Gerrit|f36a0b08f0}} to {{Gerrit|ad430721f6}}, config revision is {{Gerrit|26b02db7ba}}
* 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE


== 2021-03-23 ==
== 2022-09-20 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 20:19 cjming: end of UTC late backport window
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 21:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 20:41 eileen: civicrm revision changed from {{Gerrit|39d24e8b0a}} to {{Gerrit|f36a0b08f0}}, config revision is {{Gerrit|26b02db7ba}}
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 20:24 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 20:24 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 20:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet
* 18:50 jynus: restart db2100:s7 to apply new config
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 20:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts auth1002.eqiad.wmnet
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:51 jforrester@deploy1002: Finished deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans (duration: 00m 08s)
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 19:51 jforrester@deploy1002: Started deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove schema overrides for 6 finished EL migrations - [[phab:T267347|T267347]] [[phab:T271164|T271164]] [[phab:T267351|T267351]] [[phab:T267348|T267348]] [[phab:T267343|T267343]] [[phab:T267353|T267353]] (duration: 01m 07s)
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:40 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/vendor/: Bump wikimedia/parsoid to 0.13.0-a29 (duration: 01m 16s)
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:20 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 18:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:16 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:10 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add irc2001.wikimedia.org (running buster) as second irc server ([[phab:T224579|T224579]]) (duration: 01m 08s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:32 moritzm: installing libsdl2 security updates
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 15:31 akosiaris: pool echostore for eqiad (the first of the larger services traffic wise)
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:31 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=echostore
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 15:25 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T274200|T274200]])
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 14:43 akosiaris: pool more services in eqiad k8s. [[phab:T277741|T277741]]. Only the very large ones traffic wise are still on codfw
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=recommendation-api
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=push-notifications
* 16:42 dancy@deploy1002: Sync cancelled.
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=proton
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mobileapps
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 16:09 awight@deploy1002: backport aborted: (duration: 00m 33s)
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=linkrecommendation
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams-internal
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:20 akosiaris: pool a few more services in eqiad k8s. [[phab:T277741|T277741]]
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=wikifeeds
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=termbox
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=similar-users
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 14:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 akosiaris: pool a few services in eqiad k8s. [[phab:T277741|T277741]]
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
*