You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0))
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json)
 
(602 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-12-02 ==
== 2022-09-28 ==
* 00:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 00:48 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}
* 00:16 Urbanecm: Evening B&C window done
* 00:14 bstorm: created views and wikireplicas indexes on clouddb10[13-19] sans s1 [[phab:T268312|T268312]]
* 00:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c73f0bf0d1cdc1c7441261ffb1ad8ae12aa92ec9}}: Enable watchlist expiry feature on all wikis ([[phab:T266875|T266875]]) (duration: 01m 07s)
* 00:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-12-01 ==
== 2022-09-27 ==
* 23:15 ryankemper: [[phab:T259588|T259588]] Beginning wdqs categories data-reload on the following instances (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1005`, `wdqs2002`, `wdqs1008`, `wdqs2005`
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:13 razzi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:13 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:13 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:12 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 22:41 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=arwiki; [[phab:T246539|T246539]])
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 22:15 rzl@cumin2001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 22:13 rzl@cumin2001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 21:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:21 hashar: gerrit2001: restarting Gerrit to take in account a config change in the daemon ( --replica moved to daemonOpt config file)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:18 mutante: applied deployment_server role on deploy2002, added mcrouter cert, initial puppet run pulls mediawiki-config and other repos, downtimed in Icinga for 40 days ([[phab:T265963|T265963]])
* 21:12 TheresNoTime: closing UTC late backport window
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 eileen: civicrm revision changed from {{Gerrit|fb0ad7f39b}} to {{Gerrit|a2979cbba1}}, config revision is {{Gerrit|111cf0d63d}}
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:56 razzi@deploy1001: Finished deploy [analytics/refinery@41c60d9] (thin): Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184] (duration: 00m 07s)
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 19:56 razzi@deploy1001: Started deploy [analytics/refinery@41c60d9] (thin): Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184]
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 19:54 razzi@deploy1001: Finished deploy [analytics/refinery@41c60d9]: Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184] (duration: 08m 45s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:51 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 19:45 razzi@deploy1001: Started deploy [analytics/refinery@41c60d9]: Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184]
* 20:59 TheresNoTime: extending UTC late backport window
* 19:44 razzi: deploy refinery with refinery-source v0.0.140
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:40 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:40 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 19:36 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 19:35 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 ejegg: updated payments-wiki from {{Gerrit|8612ed1002}} to {{Gerrit|756c2f7ce0}}
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 19:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:07 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:45 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 18:40 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaApiPortalOAuth on apiportalwiki gerrit:644305 (duration: 01m 06s)
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 18:38 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaApiPortalOAuth on apiportalwiki gerrit:644305 (duration: 01m 06s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:03 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable session length instrument on officewiki [[phab:T267494|T267494]] (duration: 01m 06s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/WikimediaEvents: Backport: sessionTick: Update stream name to mw_session_tick (duration: 01m 04s)
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 17:54 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/WikimediaEvents: Backport: sessionTick: Update stream name to mw_session_tick (duration: 01m 07s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:34 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for android.user_contributions_screen [[phab:T228179|T228179]] (duration: 01m 07s)
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:19 marostegui: Sanitize s1 on clouddb1013 and clouddb1017 - [[phab:T267090|T267090]]
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 16:09 moritzm: installing vips security updates
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 15:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:10 hashar@deploy1001: Finished scap: (no justification provided) (duration: 44m 20s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:59 jbond42: install libonig updates to scp
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:51 jbond42: instal lxml updates
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:26 hashar@deploy1001: Started scap: (no justification provided)
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:24 hashar@deploy1001: sync-world aborted: testwikis wikis to 1.36.0-wmf.20 (duration: 74m 55s)
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 14:08 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:05 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 member 2 port 53 - [[phab:T268808|T268808]]
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:00 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 member 2 port 53 - [[phab:T268808|T268808]]
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 13:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to clone clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13507 and previous config saved to /var/cache/conftool/dbconfig/20201201-133917-marostegui.json
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.20
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 12:57 hashar: Preparing deployment of 1.36.0-wmf.20 # [[phab:T263186|T263186]]
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 12:38 moritzm: uploaded libonig 5.9.5-3.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 12:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:05 arturo: [11:53 moritzm] uploaded lxml 3.4.0-1+deb8u1+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:48 marostegui: Install bsd-mailx on the new clouddb hosts (needed for the check private data) [[phab:T267090|T267090]] [[phab:T268725|T268725]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13506 and previous config saved to /var/cache/conftool/dbconfig/20201201-110214-root.json
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13505 and previous config saved to /var/cache/conftool/dbconfig/20201201-104710-root.json
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:38 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13503 and previous config saved to /var/cache/conftool/dbconfig/20201201-103207-root.json
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 10:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 10:30 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13501 and previous config saved to /var/cache/conftool/dbconfig/20201201-101703-root.json
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P13500 and previous config saved to /var/cache/conftool/dbconfig/20201201-101346-marostegui.json
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 10:08 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13499 and previous config saved to /var/cache/conftool/dbconfig/20201201-100541-root.json
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13498 and previous config saved to /var/cache/conftool/dbconfig/20201201-095037-root.json
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:49 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13497 and previous config saved to /var/cache/conftool/dbconfig/20201201-093534-root.json
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 09:35 volans: upgrading spicerack to 0.0.45 on cumin1001
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 09:32 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 09:21 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13496 and previous config saved to /var/cache/conftool/dbconfig/20201201-092030-root.json
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 09:05 moritzm: removing obsolete resources on idp* and idp-test* hosts after going active-active
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P13495 and previous config saved to /var/cache/conftool/dbconfig/20201201-085916-marostegui.json
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 08:18 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 08:11 volans@cumin2001: START - Cookbook sre.dns.netbox
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 08:10 volans: upgrading spicerack to 0.0.45 on cumin2001
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13494 and previous config saved to /var/cache/conftool/dbconfig/20201201-081002-root.json
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 08:05 marostegui: Create database mwaddlink on m2 - [[phab:T267214|T267214]]
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13493 and previous config saved to /var/cache/conftool/dbconfig/20201201-075458-root.json
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13492 and previous config saved to /var/cache/conftool/dbconfig/20201201-073955-root.json
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 07:31 marostegui: Deploy "_p" databases to all clouddb hosts (except clouddb1020*) [[phab:T268312|T268312]]
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13491 and previous config saved to /var/cache/conftool/dbconfig/20201201-072451-root.json
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:15 marostegui: Deploy labsdb role on all clouddb instances (except clouddb1020*) [[phab:T268312|T268312]]
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1017 from dbctl [[phab:T268825|T268825]]', diff saved to https://phabricator.wikimedia.org/P13490 and previous config saved to /var/cache/conftool/dbconfig/20201201-065419-marostegui.json
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P13489 and previous config saved to /var/cache/conftool/dbconfig/20201201-065125-marostegui.json
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1018 from dbctl [[phab:T269069|T269069]]', diff saved to https://phabricator.wikimedia.org/P13488 and previous config saved to /var/cache/conftool/dbconfig/20201201-061321-marostegui.json
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1017 and es1018 for reboot', diff saved to https://phabricator.wikimedia.org/P13487 and previous config saved to /var/cache/conftool/dbconfig/20201201-060313-marostegui.json
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 04:13 legoktm: resetting elukey's jenkins API token ([[phab:T268978|T268978]])
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 01:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 01:55 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 01:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 01:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 01:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 01:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 00:22 ryankemper: [[phab:T259588|T259588]] Beginning wdqs categories data-reload on the following instances (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1004`, `wdqs2001`, `wdqs1003`, `wdqs2004`
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 00:20 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 00:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 00:17 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2020-11-30 ==
== 2022-09-26 ==
* 23:12 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:08 mutante: parse2001 - sudo -i /usr/local/sbin/restart-php7.2-fpm
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:08 mutante: sudo -i /usr/local/sbin/restart-php7.2-fpm
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 22:45 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 22:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove 1.34 from $wgExtDistSnapshotRefs [[phab:T268931|T268931]] (duration: 00m 57s)
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 22:34 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 22:21 cdanis@deploy1001: Synchronized docroot/thankyou: Also serve apple-app-site-assoc file from /.well-known/ [[phab:T259312|T259312]] {{Gerrit|bc52d1481}} (duration: 00m 57s)
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 22:15 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:14 mutante: parse2001 - systemctl restart ferm - had to restart ferm after reimaging (though there weren't any alerts about that) but it fixed running httpbb tests on it ([[phab:T268524|T268524]])
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 22:13 ejegg: extended and re-synchronized timing of thank you mail sender and donation queue consumer
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 21:51 mutante: parse2001 - scap pull
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:45 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 21:38 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 TheresNoTime: closing UTC late backport window
* 20:47 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 20:47 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 mutante: reimaging deploy2002 with buster (not active, deploy1001/2001 are) [[phab:T265963|T265963]]
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 20:39 mutante: reimaging parse2001 (parsoid canary) with buster ([[phab:T268524|T268524]])
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:36 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:33 mutante: depooling parse2001 to prepare for reimage [[phab:T268524|T268524]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 mutante: reimaging deploy1002 with buster - not the active deployment server, deploy1001 still is ([[phab:T265963|T265963]])
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 ariel@deploy1001: Finished deploy [dumps/dumps@2f4d931]: per job batches for page content. step one. (duration: 00m 04s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 ariel@deploy1001: Started deploy [dumps/dumps@2f4d931]: per job batches for page content. step one.
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:52 papaul: power down ms-be2059 for RAID re-configuration
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 19:47 mutante: added Sukhbir to Ops vendor maintenance calendar permissions to make changes and share like all of SRE ([[phab:T229860|T229860]])
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 19:23 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:644236 Decrease OAuth token expiration (duration: 00m 56s)
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 19:17 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:644243 group2: switch ParserCache to JSON (duration: 00m 58s)
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 19:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 19:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 17:47 joal@deploy1001: Finished deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d] (duration: 00m 08s)
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 17:47 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 17:47 joal@deploy1001: Started deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d]
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 17:43 joal@deploy1001: Finished deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d] (duration: 11m 32s)
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 17:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 17:31 joal@deploy1001: Started deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d]
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 17:07 moritzm: reset failed (now obsolete idp-u2f-sync/stunnel4 services on idp1001
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 16:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1008.eqiad.wmnet
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 16:24 volans: uploaded spicerack_0.0.45 to apt.wikimedia.org buster-wikimedia
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 16:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet (duration: 01m 15s)
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 16:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 15:43 moritzm: installing tomcat8 security updates
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1007.eqiad.wmnet
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:34 ema: cp3054: upgrade varnish to 6.0.7-1wm1 [[phab:T268736|T268736]] [[phab:T264398|T264398]]
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:28 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on all wikis - [[phab:T268517|T268517]] (duration: 00m 56s)
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 15:15 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on testwiki - [[phab:T268517|T268517]] (duration: 01m 16s)
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883] (duration: 00m 08s)
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:55 joal@deploy1001: Started deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883]
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883] (duration: 09m 26s)
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 14:45 joal@deploy1001: Started deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883]
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 14:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1006.eqiad.wmnet
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13481 and previous config saved to /var/cache/conftool/dbconfig/20201130-143232-root.json
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 14:23 marostegui: Deploy schema change on s3 codfw, lag will show up on s3 codfw [[phab:T268004|T268004]]
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P13480 and previous config saved to /var/cache/conftool/dbconfig/20201130-141953-marostegui.json
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13479 and previous config saved to /var/cache/conftool/dbconfig/20201130-141729-root.json
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13478 and previous config saved to /var/cache/conftool/dbconfig/20201130-141146-marostegui.json
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13477 and previous config saved to /var/cache/conftool/dbconfig/20201130-140226-root.json
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 13:58 ema: varnish 6.0.7-1wm1 uploaded to apt.wikimedia.org component/varnish6 [[phab:T268736|T268736]]
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13475 and previous config saved to /var/cache/conftool/dbconfig/20201130-134841-marostegui.json
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13474 and previous config saved to /var/cache/conftool/dbconfig/20201130-134722-root.json
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 13:23 jbond42: update zeromq on jessie hosts
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 13:21 dcausse: depooling wdqs1004 (lag)
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 13:18 moritzm: CAS enabled for racktables
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 13:16 gilles@deploy1001: Synchronized debug.json: [[phab:T268167|T268167]] Add mwdebug1003 to list of debug servers (duration: 00m 56s)
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 12:50 Urbanecm: EU B&C window done
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 12:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3476644e4c27dd28339f7b10c8871be2e9455394}}: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 ([[phab:T268849|T268849]]; 2nd attempt) (duration: 00m 56s)
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 12:43 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic (duration: 00m 24s)
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 12:43 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 12:41 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5585fd79119a4e077705789d1c1928c9e9efa956}}: Enable RelatedArticles on ptwikinews ([[phab:T268945|T268945]]) (duration: 00m 57s)
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba6d0f8fd2a443e5c913a292365063f01f2d076b}}: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 ([[phab:T268849|T268849]]) (duration: 00m 57s)
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 12:37 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts (duration: 02m 05s)
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9942d68c914a56073d2d192434ba24ff8cb921ba}}: Assign patrolmarks right to autoconfirmed users on itwiki ([[phab:T268734|T268734]]) (duration: 00m 57s)
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 12:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 12:35 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 24s)
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:34 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 51s)
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 12:33 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 12:32 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:27 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: New eqiad maps hosts (duration: 00m 03s)
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:27 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: New eqiad maps hosts
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 12:24 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts (duration: 00m 03s)
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:24 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|2922abe7b810f1b53b446af783dc9d51e6585225}}: Remove wgContentTranslationRESTBase config ([[phab:T266213|T266213]]) (duration: 00m 57s)
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 11:43 marostegui: Sanitize clouddb1016:3318 - [[phab:T267090|T267090]]
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 11:38 ema: A:cp upgrade fifo-log-demux to 0.6.2 [[phab:T268883|T268883]]
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 11:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:644196{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 11:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:644196{{!}} Bumping portals to master (T128546)]] (duration: 01m 01s)
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 11:32 ariel@deploy1001: Finished deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir (duration: 00m 04s)
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 11:32 ariel@deploy1001: Started deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 11:28 ema: upload fifo-log-demux 0.6.2 to buster-wikimedia [[phab:T268883|T268883]]
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13473 and previous config saved to /var/cache/conftool/dbconfig/20201130-111321-root.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 11:00 hnowlan: bootstrapping maps1005 cassandra
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13472 and previous config saved to /var/cache/conftool/dbconfig/20201130-105818-root.json
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13471 and previous config saved to /var/cache/conftool/dbconfig/20201130-104314-root.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 10:29 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:29 marostegui: Compare data between clouddb1014:3312 clouddb1018:3312 labsdb1012 [[phab:T267090|T267090]]
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 10:29 marostegui: Compare data between clouddb1012:3312 clouddb1018:3312 labsdb1012 [[phab:T267090|T267090]]
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13470 and previous config saved to /var/cache/conftool/dbconfig/20201130-102811-root.json
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 10:24 akosiaris: applying https://gerrit.wikimedia.org/r/q/topic:%22k8s_config%22 series of patches
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:18 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 10:18 ema: cp4031: reboot to test atsmtail/fifo-log-demux service dependencies -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/643922 [[phab:T256467|T256467]]
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 10:11 ema: cp4032: upgrade varnish to 6.0.7-1wm1 [[phab:T268736|T268736]]
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:06 moritzm: installing NSS security updates
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P13469 and previous config saved to /var/cache/conftool/dbconfig/20201130-095729-marostegui.json
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13468 and previous config saved to /var/cache/conftool/dbconfig/20201130-095621-root.json
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13467 and previous config saved to /var/cache/conftool/dbconfig/20201130-094117-root.json
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:40 marostegui: Stop MySQL on db1087 to clone clouddb1016:3318 [[phab:T267090|T267090]])
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from s8 and pool db1092 instead temporarily on vslow [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13466 and previous config saved to /var/cache/conftool/dbconfig/20201130-093909-marostegui.json
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13465 and previous config saved to /var/cache/conftool/dbconfig/20201130-092614-root.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1089+ (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13464 and previous config saved to /var/cache/conftool/dbconfig/20201130-092154-root.json
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:51 marostegui: Deploy schema change on db1089
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P13463 and previous config saved to /var/cache/conftool/dbconfig/20201130-085101-marostegui.json
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:41 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 08:36 marostegui: Compare data between clouddb1016:3315 labsdb1012 [[phab:T267090|T267090]]
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 07:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 07:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 07:11 marostegui: Deploy schema change on s1 codfw - [[phab:T268004|T268004]]
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 07:05 marostegui: Stop mysql on db1124:3318 to clone clouddb1016:3318, lag will show up on wikireplicas on s8 [[phab:T267090|T267090]]
* 13:56 moritzm: installing mako security updates
* 06:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 04:26 kart_: Updated cxserver to  2020-11-23-050106-production ([[phab:T262253|T262253]], [[phab:T268410|T268410]])
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 04:18 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 04:14 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:11 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-11-27 ==
== 2022-09-25 ==
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 15:50 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 15:06 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 14:50 elukey: roll restart zookeeper on druid* nodes for openjdk upgrades
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 14:50 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 10:52 jayme: updated helmfile to 0.135.0-1 on deploy*,contint*
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 10:51 jayme: updated helm-diff to 3.1.3-1 on contint*
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 10:49 jayme: updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum*
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 10:06 jayme: updated helm and helmfile on deploy2001
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 10:04 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 10:00 jayme: imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 08:05 elukey: roll restart druid public cluster for openjdk upgrades
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 06:39 marostegui: Stop mysql on es1015 [[phab:T268810|T268810]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json
* 06:30 marostegui: Remove es1016 from tendril and zarcillo [[phab:T268812|T268812]]
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for decommissioning [[phab:T268810|T268810]]', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json


== 2020-11-26 ==
== 2022-09-23 ==
* 17:18 jayme: downgrade helmfile to 0.125.2-1 on deploy*
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 17:05 jayme: updated helm-diff and helmfile on deploy100* and deploy200*
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 16:34 jayme: imported helm-diff 3.1.3-1 into buster-wikimedia and stretch-wikimedia
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 15:01 moritzm: installing libonig security updates
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13452 and previous config saved to /var/cache/conftool/dbconfig/20201126-144446-root.json
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 14:38 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:36 moritzm: installing zeromq3 security updates for stretch
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 jbond42: failing idp back to idp2001
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13451 and previous config saved to /var/cache/conftool/dbconfig/20201126-142942-root.json
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 14:23 moritzm: remove labtestpuppetmaster2001 from debmonitor [[phab:T258103|T258103]]
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13450 and previous config saved to /var/cache/conftool/dbconfig/20201126-141439-root.json
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 13:52 elukey: roll restart druid daemons on druid analytics to pick up new openjdk upgrades
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 13:52 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 13:52 root@cumin1001: START - Cookbook sre.hosts.downtime
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 13:52 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 13:50 moritzm: installing python3.5 security updates
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P13449 and previous config saved to /var/cache/conftool/dbconfig/20201126-133204-marostegui.json
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13448 and previous config saved to /var/cache/conftool/dbconfig/20201126-132918-root.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 13:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13447 and previous config saved to /var/cache/conftool/dbconfig/20201126-131414-root.json
* 13:07 hnowlan: testing depooling kartotherian on maps2004 to reduce load
* 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 13:01 jbond42: update puppet_compiler on compiler1003
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13446 and previous config saved to /var/cache/conftool/dbconfig/20201126-125911-root.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P13445 and previous config saved to /var/cache/conftool/dbconfig/20201126-124253-marostegui.json
* 12:31 jbond42: fail over idp.wikimedia.org
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:53 moritzm: rebooting seaborgium for kernel update
* 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:40 marostegui: Deploy schema change on s8 codfw - there will be lag on s8 codfw - [[phab:T268004|T268004]]
* 11:16 moritzm: restarting archiva to pick up Java security update
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13442 and previous config saved to /var/cache/conftool/dbconfig/20201126-104324-root.json
* 10:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13441 and previous config saved to /var/cache/conftool/dbconfig/20201126-102820-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13440 and previous config saved to /var/cache/conftool/dbconfig/20201126-101317-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13439 and previous config saved to /var/cache/conftool/dbconfig/20201126-095813-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P13438 and previous config saved to /var/cache/conftool/dbconfig/20201126-094729-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P13437 and previous config saved to /var/cache/conftool/dbconfig/20201126-094702-marostegui.json
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P13436 and previous config saved to /var/cache/conftool/dbconfig/20201126-094639-marostegui.json
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13435 and previous config saved to /var/cache/conftool/dbconfig/20201126-094538-root.json
* 09:38 marostegui: Stop mysql on es1016 for decommission
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13434 and previous config saved to /var/cache/conftool/dbconfig/20201126-093035-root.json
* 09:26 ema: deployment-cache-text06: upgrade Varnish to 6.0.7-1wm1 [[phab:T268736|T268736]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13433 and previous config saved to /var/cache/conftool/dbconfig/20201126-091532-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13432 and previous config saved to /var/cache/conftool/dbconfig/20201126-090028-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P13431 and previous config saved to /var/cache/conftool/dbconfig/20201126-084903-marostegui.json
* 08:40 elukey: roll restart cassandra on aqs10* for openjdk upgrades
* 08:40 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 08:09 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:08 marostegui: Deploy schema change on s7 codfw - there will be lag on s7 codfw - [[phab:T268004|T268004]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13430 and previous config saved to /var/cache/conftool/dbconfig/20201126-072506-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13429 and previous config saved to /var/cache/conftool/dbconfig/20201126-071514-root.json
* 07:12 marostegui: Enable GTID on clouddb1018:3317 clouddb1014:3317 [[phab:T267090|T267090]]
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13428 and previous config saved to /var/cache/conftool/dbconfig/20201126-071003-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13427 and previous config saved to /var/cache/conftool/dbconfig/20201126-070010-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13426 and previous config saved to /var/cache/conftool/dbconfig/20201126-065500-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13425 and previous config saved to /var/cache/conftool/dbconfig/20201126-064507-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13424 and previous config saved to /var/cache/conftool/dbconfig/20201126-063956-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1016 from dbctl', diff saved to https://phabricator.wikimedia.org/P13423 and previous config saved to /var/cache/conftool/dbconfig/20201126-063234-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13422 and previous config saved to /var/cache/conftool/dbconfig/20201126-063003-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13421 and previous config saved to /var/cache/conftool/dbconfig/20201126-062811-marostegui.json
* 06:17 marostegui: Stop mysql on db1124:3315 to clone clouddb1016:3315 [[phab:T267090|T267090]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for schema change', diff saved to https://phabricator.wikimedia.org/P13420 and previous config saved to /var/cache/conftool/dbconfig/20201126-061552-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P13419 and previous config saved to /var/cache/conftool/dbconfig/20201126-061459-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P13418 and previous config saved to /var/cache/conftool/dbconfig/20201126-061432-marostegui.json
* 06:08 ryankemper: [[phab:T268770|T268770]] [eqiad] Finished rolling restart of cirrus eqiad. All cirrus elasticsearch restarts are now complete (cloudelastic, relforge, eqiad, codfw)
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 04:24 ryankemper: [[phab:T268770|T268770]] [eqiad] Begin rolling restart of cirrus eqiad, 3 nodes at a time
* 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 03:07 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I805699ecfa}} (duration: 00m 58s)


== 2020-11-25 ==
== 2022-09-22 ==
* 23:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 22:55 mutante: mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script ([[phab:T245757|T245757]])
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 22:47 ejegg: increased Ingenico API call timeout
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 shdubsh: beginning rolling restart of logstash cluster - eqiad
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:23 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
* 22:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:19 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:49 krinkle@deploy1001: Synchronized php-1.36.0-wmf.18/includes/libs/CSSMin.php: {{Gerrit|I26ed3e5e9a}} - fix [[phab:T268308|T268308]] (duration: 00m 59s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mutante: LDAP added user duminasi to group wmf ([[phab:T266791|T266791]])
* 20:55 brennen: end of utc late backport & config window
* 20:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 elukey: upload new hive* packages 2.2.3-2 to stretch-wikimedia - thirdparty/bigtop14 component
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 18:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 18:38 mutante: LDAP adding swagoel to NDA [[phab:T267314|T267314]]#6625628
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 18:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:05 ryankemper: [[phab:T268770|T268770]] [cloudelastic] Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 18:01 ryankemper: [cloudelastic] (forgot to mention this) Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
* 20:36 brennen@deploy1002: backport aborted: (duration: 02m 16s)
* 17:58 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts complete, service is healthy. This is done.
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:55 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1006` complete and all 3 elasticsearch clusters are green, all cloudelastic instances are now complete
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:49 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1005` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:44 shdubsh: beginning rolling restart of logstash cluster - codfw
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:44 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1004` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:39 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1003` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1002` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:28 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1001` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 ryankemper: [[phab:T268770|T268770]] Freezing writes to cloudelastic in preparation for restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint1002`
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 17:09 ryankemper: [[phab:T268770|T268770]] [cloudelastic] Downtimed `cloudelastic100[1-6]` in icinga in preparation for cloudelastic search elasticsearch cluster restart
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:05 ryankemper: [[phab:T268770|T268770]] Begin rolling restart of eqiad cirrus elasticsearch, 3 nodes at a time
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 17:04 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:00 godog: fail sdk on ms-be2031
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:49 godog: clean up sdk1 on / on ms-be2031
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:46 elukey: move analytics1066 to C3 - [[phab:T267065|T267065]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:21 mutante: puppetmaster - revoking old and signing new cert for mwdebug1003
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 16:11 elukey: move analytics1065 to C3 - [[phab:T267065|T267065]]
* 18:38 dancy@deploy1002: Started scap: testing
* 16:10 mutante: shutting down mwdebug1003 - reimaging for [[phab:T245757|T245757]]
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 16:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 16:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 16:02 moritzm: installing golang-1.7 updates for stretch
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 15:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 15:38 elukey: move stat1004 to A5 - [[phab:T267065|T267065]]
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:34 moritzm: removing maps2002 from debmonitor
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:10 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:04 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 moritzm: installing krb5 security updates for Buster
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:39 dancy@deploy1002: Sync cancelled.
* 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 14:00 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:44 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:43 akosiaris: assign IPs to kubestage200<nowiki>{</nowiki>1,2,3<nowiki>}</nowiki>.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox [[phab:T268747|T268747]]
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 marostegui: Deploy schema change on commonswiki.watchlist on s4 codfw - there will be lag on s4 codfw - [[phab:T268004|T268004]]
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:08 akosiaris: assign IPs to kubestage200<nowiki>{</nowiki>1,2,3<nowiki>}</nowiki>.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13414 and previous config saved to /var/cache/conftool/dbconfig/20201125-124202-root.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13413 and previous config saved to /var/cache/conftool/dbconfig/20201125-122659-root.json
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13412 and previous config saved to /var/cache/conftool/dbconfig/20201125-121155-root.json
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13411 and previous config saved to /var/cache/conftool/dbconfig/20201125-115652-root.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:49 gilles@deploy1001: Finished deploy [performance/coal@be167b2]: [[phab:T268724|T268724]] (duration: 00m 06s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:48 gilles@deploy1001: Started deploy [performance/coal@be167b2]: [[phab:T268724|T268724]]
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P13408 and previous config saved to /var/cache/conftool/dbconfig/20201125-114717-marostegui.json
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 11:27 gilles@deploy1001: Finished deploy [performance/coal@468bc50]: [[phab:T268724|T268724]] (duration: 00m 06s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:27 gilles@deploy1001: Started deploy [performance/coal@468bc50]: [[phab:T268724|T268724]]
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:27 jbond42: install krb5 updates to jessie hosts
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 10:52 jbond42: failover idp primary to idp2001
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:51 kormat: deployed wmfmariadbpy 0.6.1 to `C:wmfmariadbpy`
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 10:43 kormat: uploaded wmfmariadbpy 0.6.1 to stretch+buster apt repos
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:21 jynus: upgrade wmfbackup-check package on alert* hosts
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 10:11 kormat: uploaded wmfmariadbpy 0.6 to stretch+buster apt repos
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 09:54 moritzm: uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:45 marostegui: Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - [[phab:T268725|T268725]]
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json
* 08:43 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 [[phab:T268469|T268469]] (duration: 00m 59s)
* 08:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json
* 08:14 kormat: rebooting es1024 [[phab:T268469|T268469]]
* 08:08 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:07 kormat: stopping mariadb on es1024 [[phab:T268469|T268469]]
* 08:04 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es5 [[phab:T268469|T268469]] (duration: 00m 58s)
* 08:02 marostegui: Upgrade db2108
* 08:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json
* 06:38 marostegui: Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 [[phab:T267090|T267090]]
* 06:33 marostegui: Restart clouddb1019:3314, clouddb1019:3316
* 06:32 marostegui: Restart clouddb1015:3314, clouddb1015:3316
* 06:28 marostegui: Check private data on clouddb1014:3312 and clouddb1018:3312 [[phab:T267090|T267090]]
* 05:48 marostegui: Sanitize clouddb1014:3312 and clouddb1018:3312 [[phab:T267090|T267090]]
* 01:10 tgr_: Evening deploys done
* 01:07 tgr@deploy1001: Finished scap: Backport: [[gerrit:643156{{!}}GrowthExperiments: Add Russian aliases (T268519)]] (duration: 32m 09s)
* 00:35 tgr@deploy1001: Started scap: Backport: [[gerrit:643156{{!}}GrowthExperiments: Add Russian aliases (T268519)]]


== 2020-11-24 ==
== 2022-09-21 ==
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:50 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] (duration: 01m 51s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:48 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]]
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:27 andrewbogott: restarting slapd on serpens
* 20:46 tgr_: UTC late deploys done
* 21:20 cdanis: ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:17 andrewbogott: restarting slapd on seaborgium
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - [[phab:T254606|T254606]] (duration: 01m 01s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:49 ryankemper: [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service`
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:30 gilles@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/NavigationTiming/extension.json: (no justification provided) (duration: 00m 57s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|331a129}}: Remove temporary feature flags ([[phab:T258116|T258116]]) (duration: 00m 57s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:20 mutante: LDAP - added derick to group nda ([[phab:T268150|T268150]])
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 19:17 moritzm: installing Java security updates on elastic* and relforge*
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:643260 group1: Switch ParserCache to JSON (duration: 00m 57s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:59 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:56 elukey@deploy1001: Finished deploy [analytics/refinery@1ff0868]: Regular analytics weekly train (duration: 09m 50s)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:56 volans: migrating anycast zonefile to the Netbox-generated ones - [[phab:T258729|T258729]]
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 18:55 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 18:51 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:46 elukey@deploy1001: Started deploy [analytics/refinery@1ff0868]: Regular analytics weekly train
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 18:46 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:45 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] (duration: 01m 09s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:45 elukey: restart memcached on mw2339 to pick up the correct port (was bound on 11211 rather than 11210)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 18:44 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]]
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:19 ejegg: updated Fundraising CiviCRM from {{Gerrit|28464df973}} to {{Gerrit|fb0ad7f39b}}
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 18:07 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 18:06 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 18:04 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:51 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:10 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:29 elukey: move analytics1064 from C2 to C3 eqiad - [[phab:T267065|T267065]]
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 16:06 hnowlan: finished removing restbase2009 from cassandra cluster
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 16:01 cmjohnson1: replacing the sfp at cr1-eqiad xe-3/2/1 [[phab:T267672|T267672]]
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 15:42 marostegui: Drop kraken user from s4 - [[phab:T268636|T268636]]
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 15:38 elukey: move druid1005 from rack B7 to B6 - [[phab:T267065|T267065]]
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 15:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 15:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:28 jayme: pushed docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0 docker-registry.discovery.wmnet/calico/node:v3.17.0 docker-registry.discovery.wmnet/calico/typha:v3.17.0
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:23 jayme: imported calico 3.17.0 into component/calico-future for stretch-wikimedia
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 15:07 godog: swift eqiad-prod: decom ms-be1022 ssd from swift - [[phab:T267870|T267870]]
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:01 marostegui: Enable GTID on clouddb1013:3311 clouddb1015:3314 clouddb1017:3311 clouddb1019:3314 [[phab:T267090|T267090]]
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 14:58 elukey: move analytics1072 from rack B2 to B3 - [[phab:T267065|T267065]]
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:53 jayme: imported helmfile 0.135.0-1 into buster-wikimedia and stretch-wikimedia
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 14:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P13392 and previous config saved to /var/cache/conftool/dbconfig/20201124-144219-marostegui.json
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 14:34 liw: finished testing Scap on Beta cluster in prep for https://phabricator.wikimedia.org/T268634
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:31 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:27 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13391 and previous config saved to /var/cache/conftool/dbconfig/20201124-141912-root.json
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 moritzm: reset-failed idp-u2f.service after Hiera change (one time issue, will soon be obsolete)
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13390 and previous config saved to /var/cache/conftool/dbconfig/20201124-140409-root.json
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:52 elukey@deploy1001: Finished deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252 (duration: 00m 05s)
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:52 elukey@deploy1001: Started deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13389 and previous config saved to /var/cache/conftool/dbconfig/20201124-134905-root.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 marostegui: Stop MySQL on db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13388 and previous config saved to /var/cache/conftool/dbconfig/20201124-133709-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13387 and previous config saved to /var/cache/conftool/dbconfig/20201124-133402-root.json
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 13:13 jgleeson: civicrm revision is {{Gerrit|28464df973}}, config revision is {{Gerrit|928918a9b6}}
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:01 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.18
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:01 liw: done testing Scap release candidate on beta (failed: disk full on deploy01)
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:49 hnowlan: disabled cassandra service on restbase2009, starting drain
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 liw: testing upcoming Scap release on beta
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 11:59 jayme: imported helm3 3.4.1-1 into buster-wikimedia and stretch-wikimedia
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 11:52 XioNoX: push CR641949 and CR641949
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:38 effie: rolling depool and pool app and api clusters - [[phab:T244340|T244340]]
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:25 _joe_: rebuild docker images for [[phab:T268612|T268612]]
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:20 effie: disable puppet on api and app servers to rollout onhost memcached - [[phab:T244340|T244340]]
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:15 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:14 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 marostegui: Stop mysql on db1125:3312 to clone clouddb1014:3312 and clouddb1018:3312 - [[phab:T267090|T267090]]
* 10:45 moritzm: upgrading seaborgium to Buster
* 10:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:31 jbond42: up0load new cas package to wikimedia-buster
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2073', diff saved to https://phabricator.wikimedia.org/P13384 and previous config saved to /var/cache/conftool/dbconfig/20201124-100139-marostegui.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2026', diff saved to https://phabricator.wikimedia.org/P13383 and previous config saved to /var/cache/conftool/dbconfig/20201124-100020-marostegui.json
* 09:48 volans: Migrating codfw private/public primary DNS records to the auto-generated ones from Netbox - [[phab:T258729|T258729]]
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13382 and previous config saved to /var/cache/conftool/dbconfig/20201124-094449-marostegui.json
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P13381 and previous config saved to /var/cache/conftool/dbconfig/20201124-094159-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13380 and previous config saved to /var/cache/conftool/dbconfig/20201124-094052-marostegui.json
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P13379 and previous config saved to /var/cache/conftool/dbconfig/20201124-093517-marostegui.json
* 09:23 marostegui: Deploy schema change on db2114 and db1096:3316 - [[phab:T268004|T268004]]
* 09:13 ema: cp4032: switch back to varnish 6.0.6-1wm2 after [[phab:T264398|T264398]] experiment, fix [[phab:T268243|T268243]]
* 09:09 elukey: drop principals and keytabs for analytics10[42-57] - [[phab:T267932|T267932]]
* 09:03 gilles@deploy1001: Finished deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)
* 09:03 gilles@deploy1001: Started deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it
* 08:49 _joe_: uploading the base production docker images for MediaWiki, [[phab:T265324|T265324]]
* 08:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:43 _joe_: refreshing debian buster base image
* 08:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:31 marostegui: Deploy user for pki database for dbproxy1012, dbproxy1014, dbproxy2001 - [[phab:T268329|T268329]]
* 08:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:27 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:58 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13378 and previous config saved to /var/cache/conftool/dbconfig/20201124-074342-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13377 and previous config saved to /var/cache/conftool/dbconfig/20201124-073202-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13376 and previous config saved to /var/cache/conftool/dbconfig/20201124-073125-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13375 and previous config saved to /var/cache/conftool/dbconfig/20201124-072755-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13374 and previous config saved to /var/cache/conftool/dbconfig/20201124-072715-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13373 and previous config saved to /var/cache/conftool/dbconfig/20201124-072249-marostegui.json
* 07:00 _joe_: changing the mtail recipe for mediawiki/apache to use an actual histogram
* 06:31 marostegui: Sanitize clouddb1019:3314 [[phab:T267090|T267090]]
* 06:28 marostegui: Sanitize clouddb1015:3314 [[phab:T267090|T267090]]
* 03:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 05s)
* 00:29 reedy@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 06s)


== 2020-11-23 ==
== 2022-09-20 ==
* 22:56 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:19 cjming: end of UTC late backport window
* 22:52 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 22:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:54 mutante: mwdebug1003 - removing php packages and letting puppet reinstall them after it has the correct APT config [[phab:T267248|T267248]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 21:26 mutante: mwdebug1003 - scap pull because <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mwdebug1003 is CRITICAL
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 20:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 20:09 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 04s)
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 20:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 20:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 00m 42s)
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 19:22 Urbanecm: Morning B&C done
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 01m 05s)
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:15 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.18)
* 18:50 jynus: restart db2100:s7 to apply new config
* 19:12 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.16)
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7561926e1dede35c2ad27d587c044a5ebf5e6648}}: GrowthExperiments: Enable help panel top-posting on svwiki, ruwiki ([[phab:T268227|T268227]]) (duration: 01m 06s)
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 17:46 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 17:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 17:41 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2010.codfw.wmnet
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 17:29 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:22 mutante: DNS - new project language 'skr' added - Saraiki ( سرائیکی Sarā'īkī, also spelt Siraiki, or Seraiki) is an Indo-Aryan language of the Lahnda group, spoken in the south-western half of the province of Punjab in Pakistan.
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:12 elukey: move aqs1004 from rack A4 to A3 - [[phab:T267065|T267065]]
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:58 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 16:37 elukey: move analytics1070 from rack A7 to rack A5 - [[phab:T267065|T267065]]
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 15:13 godog: add ipv6 forward/reverse records for grafana1002 / grafana2001
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:05 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 14:57 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2009.codfw.wmnet
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 14:10 kormat: cleaning up heartbeat.heartbeat on pc3 [[phab:T268336|T268336]]
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 14:09 kormat: cleaning up heartbeat.heartbeat on pc2 [[phab:T268336|T268336]]
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 14:04 kormat: cleaning up heartbeat.heartbeat on pc1 [[phab:T268336|T268336]]
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 14:01 moritzm: imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia [[phab:T245757|T245757]]
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 13:56 XioNoX: push CR641960
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 13:56 godog: add ms-be106[0-3] to eqiad-prod with minimal weight - [[phab:T268435|T268435]]
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 13:17 moritzm: imported ploticus 2.42-4.2~wmf1 to buster-wikimedia [[phab:T245757|T245757]]
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 13:11 Lucas_WMDE: EU backport+config window done
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 13:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/Wikibase: Backport: [[gerrit:642103{{!}}Calculate page props on-the-fly during RDF dump (T145712)]] (duration: 01m 14s)
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 13:01 hnowlan: started cassandra pooling maps2009
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13370 and previous config saved to /var/cache/conftool/dbconfig/20201123-125815-marostegui.json
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13369 and previous config saved to /var/cache/conftool/dbconfig/20201123-125759-marostegui.json
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13368 and previous config saved to /var/cache/conftool/dbconfig/20201123-125417-marostegui.json
* 16:42 dancy@deploy1002: Sync cancelled.
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13367 and previous config saved to /var/cache/conftool/dbconfig/20201123-125345-marostegui.json
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 12:34 Lucas_WMDE: Undeployed patch for [[phab:T260349|T260349]]
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 12:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2008.codfw.wmnet
* 16:09 awight@deploy1002: backport aborted: (duration: 00m 33s)
* 12:32 Urbanecm: Run scap pull at mwdebug1003
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 12:28 marostegui: Stop mysql on db1121 to clone  clouddb1017:3314 clouddb1019:3314
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 12:27 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone clouddb1017:3314 clouddb1019:3314 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13366 and previous config saved to /var/cache/conftool/dbconfig/20201123-122549-marostegui.json
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c00d7e8e4c407b76aa2930dfa040394e874d77bc}}: Move ContentTranslation out of Beta for br, ka, ast, si and ig WPs ([[phab:T267212|T267212]], [[phab:T266217|T266217]], [[phab:T266218|T266218]], [[phab:T266219|T266219]], [[phab:T266220|T266220]]) (duration: 01m 06s)
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 12:01 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; [[phab:T246539|T246539]])
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 11:49 XioNoX: eqiad row A, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 05s)
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 21s)
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:25 hnowlan: starting cassandra bootstrap of maps2008
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:20 effie: enable puppet on cp* hosts
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:16 moritzm: installing poppler security updates on stretch
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 11:13 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 11:13 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:05 XioNoX: eqiad row A, standardize interfaces descriptions and ranges order
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:26 effie: disable puppet on cp* hosts to merge 641730
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 10:26 moritzm: rebooting serpens
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 10:21 XioNoX: eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:48 XioNoX: eqiad row B, standardize interfaces descriptions and ranges order
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:46 elukey: drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop  test cluster)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:43 godog: start stress testing on ms-be106* - [[phab:T268435|T268435]]
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:41 elukey: drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster)
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 08:36 elukey: drop analytics1028's krb principals from krb1001 - old decommed node
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:35 moritzm: installing remaining krb5 security updates for Stretch
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:27 marostegui: Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - [[phab:T267090|T267090]]
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 06:46 marostegui: Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing [[phab:T267090|T267090]]
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-11-21 ==
== 2022-09-19 ==
* 09:18 joal: Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 09:17 joal: Drop historical logs of '
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 08:28 ariel@deploy1001: Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s)
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 08:28 ariel@deploy1001: Started deploy [dumps/dumps@1a76a9a]: revinfo updates
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 08:10 elukey: remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:05 elukey: remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 cjming: end of UTC late backport window
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 17:36 dancy@deploy1002: Sync cancelled.
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-11-20 ==
== 2022-09-17 ==
* 23:38 mutante: synced puppet-compiler facts - new hosts should be usable in compiler
* 12:17 Emperor: set thanos ring replicas to 3.80 [[phab:T311690|T311690]]
* 22:30 mutante: cumin1001 - sudo systemctl start cumin-check-aliases ->  <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK  [[phab:T268369|T268369]]
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
* 21:30 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
* 20:26 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
* 20:09 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
* 19:52 mutante: releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
* 19:45 mutante: releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed)
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
* 19:39 mutante: Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise.  from 72 to 25 active alerts
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
* 19:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
* 18:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
* 18:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
* 18:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
* 18:36 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
* 18:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
* 18:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 18:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 18:14 dwisehaupt: shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - [[phab:T267259|T267259]]
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
* 17:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:32 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
* 17:24 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 16:48 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 16:40 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 16:29 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:29 razzi@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 16:28 razzi: removed canceled ip address records for kafka-test1002 from netbox
* 16:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:01 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:01 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:42 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:01 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 14:58 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:30 elukey: force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings
* 14:28 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 14:00 elukey: restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 13:34 liw: finished trying to test scap on beta cluster
* 13:24 bblack: cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs
* 13:12 liw: testing upcoming Scap release on beta
* 13:00 bblack: dns*: upgrade remainder of fleet to gdnsd to 3.4.1
* 12:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 12:29 moritzm: uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json
* 12:14 marostegui: Run check private data on clouddb1013:3311  clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 [[phab:T267090|T267090]]
* 12:11 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; [[phab:T246539|T246539]])
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P13348 and previous config saved to /var/cache/conftool/dbconfig/20201120-114614-marostegui.json
* 11:15 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13347 and previous config saved to /var/cache/conftool/dbconfig/20201120-104459-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13345 and previous config saved to /var/cache/conftool/dbconfig/20201120-102955-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13344 and previous config saved to /var/cache/conftool/dbconfig/20201120-101452-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13342 and previous config saved to /var/cache/conftool/dbconfig/20201120-095949-root.json
* 09:56 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/642346)
* 09:21 marostegui: Move pc2010 right under pc1007 to investigate lag issues (using orchestrator for this move)
* 09:07 moritzm: updating krb5 on krb*
* 08:57 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:50 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 08:32 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 08:31 elukey: roll restart kafka daemons on kafka-jumbo100* to pick up openjdk upgrades
* 08:13 marostegui: Enable GTID on clouddb1015:3316 clouddb1019:3316 - [[phab:T267090|T267090]]
* 08:10 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/642268)
* 08:04 marostegui: Stop db1124:3313 to clone clouddb1013:3313, clouddb1017:3313
* 08:00 XioNoX: update cloud-in4 filter in codfw
* 04:57 bblack: dns3001: upgrade gdnsd to 3.4.1
* 04:55 bblack: authdns1001: upgrade gdnsd to 3.4.1
* 04:49 bblack: authdns2001: upgrade gdnsd to 3.4.1
* 04:45 bblack: dns3002: upgrade gdnsd to 3.4.1
* 04:41 bblack: reprepro: uploaded gdnsd-3.4.1-1~wmf1 to buster-wikimedia


== 2020-11-19 ==
== 2022-09-16 ==
* 23:59 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
* 23:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
* 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
* 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for [[phab:T317904|T317904]]
* 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 23:06 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 22:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 22:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 22:23 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 22:07 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 22:06 krinkle@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo/: [[phab:T267668|T267668]] - {{Gerrit|I1115135ee}}, and {{Gerrit|Ic239bb9807}} (duration: 01m 07s)
* 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
* 20:12 herron: upgraded logstash-next to kibana 7.10
* 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 19:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 18:48 mutante: gerrit1001 - re-enabling puppet after merging gerrit:642086 for [[phab:T268260|T268260]] (upstream bug 13701)
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
* 18:41 mutante: gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%<nowiki>{</nowiki>REQUEST_SCHEME<nowiki>}</nowiki> in apache config, reloaded apache to fix redirect issue
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
* 18:37 mutante: gerrit1001 - disabled puppet
* 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 18:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
* 18:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
* 18:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 17:59 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
* 17:47 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
* 17:33 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s)
* 15:39 dancy@deploy1002: Started scap: testing
* 17:33 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
* 17:32 hashar: Upgrading Gerrit to 3.2.5 and restarting it
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 17:05 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s)
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 17:04 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:59 ryankemper: [[phab:T246345|T246345]] [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy
* 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 16:58 kormat: started mariadb on pc2010, now with more 🤞
* 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 16:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:54 kormat: stopping mariadb on pc2010
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:43 hashar: Restarting Gerrit replica instance on gerrit2001
* 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:42 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s)
* 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 16:42 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server)
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:41 kormat: stopped and started replication on pc2010 to see if that would help it recover
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 16:40 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s)
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:40 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:35 elukey: roll restart hadoop workers for openjdk upgrades
* 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 16:35 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:58 moritzm: installing jupyter-notebook security updates on an-coord*
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:56 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:52 bblack: dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet'
* 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:44 bblack: dns3001: upgrade gdnsd to 3.4.0
* 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:41 bblack: dns1001: upgrade gdnsd to 3.4.0
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
* 15:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:36 bblack: dns3002: upgrade gdnsd to 3.4.0
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:31 bblack: authdns1001: upgrade gdnsd to 3.4.0
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:30 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
* 15:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
* 15:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
* 15:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
* 15:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
* 15:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
* 15:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
* 14:57 moritzm: installing openldap security updates on buster (client side tools/libs, slapd already updated)
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
* 14:54 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
* 14:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
* 14:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
* 14:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
* 14:47 marostegui: Sanitize enwiki on clouddb1017 [[phab:T267090|T267090]]
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
* 14:45 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
* 14:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
* 14:43 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
* 14:41 marostegui: Sanitize enwiki on clouddb1013 [[phab:T267090|T267090]]
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
* 14:39 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
* 14:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
* 14:29 moritzm: rolling restart of app server canaries to pick up latest sec updates
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
* 14:21 moritzm: installing krb5 security updates on stretch
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
* 14:02 bblack: authdns2001: upgrade gdnsd to 3.4.0
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 13:45 XioNoX: push current state of audited cloud-in4 filter - [[phab:T264993|T264993]]
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 13:42 moritzm: removing stray wireshark 2.2.6 wireshark libs on Stretch
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
* 13:32 moritzm: installing wireshark security updates
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
* 13:30 bblack: dns4002: upgrade gdnsd to 3.4.0
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
* 13:28 bblack: reprepro: updated buster-wikimedia gdnsd package to 3.4.0-1~wmf1
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
* 12:43 moritzm: installing libproxy security updates on stretch
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
* 12:38 marostegui: Stop mysql on db1106 to clone clouddb1013 and clouddb1017 [[phab:T267090|T267090]]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
* 12:00 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
* 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
* 11:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
* 11:44 moritzm: installing Java security updates on Hadoop/Kafka Jumbo hosts
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
* 11:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
* 11:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 11:00 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; [[phab:T246539|T246539]])
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:28 marostegui: Restart mysql on db1115, tendril and dbtree will be down for a few minutes
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
* 09:40 marostegui: Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - [[phab:T267090|T267090]]
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
* 09:29 moritzm: upgrading serpens to Buster
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
* 09:26 XioNoX: eqiad row C: move Ganeti/LVS interfaces to individual terms
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
* 09:07 elukey: restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
* 08:56 effie: disable puppet on mw canaries to merge 641816
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:49 elukey: restart hadoop daemons on analytics1058 for openjdk upgrades (canary)
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:25 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
* 08:19 XioNoX: eqiad row C: standardize interfaces config
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
* 07:55 XioNoX: eqiad row D: move Ganeti/LVS interfaces to individual terms
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
* 07:47 XioNoX: eqiad row D: standardize interfaces config
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
* 07:22 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
* 07:05 elukey: roll restart java daemons on Hadoop test for openjdk upgrades
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
* 07:05 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
* 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:21 marostegui: Remove es1014 from tendril and zarcillo [[phab:T268102|T268102]]
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:08 marostegui: Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - [[phab:T267090|T267090]]
* 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
* 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
* 05:51 marostegui: Install 10.6 on db1168 [[phab:T301879|T301879]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
* 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
* 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)


== 2020-11-18 ==
== 2022-09-15 ==
* 23:34 mutante: disabling puppet on memcache::mediawiki - deploying gerrit:637742
* 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
* 22:56 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]] (duration: 00m 04s)
* 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:56 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]]
* 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:30 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 22:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] (duration: 07m 06s)
* 22:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; [[phab:T268181|T268181]]) (duration: 01m 04s)
* 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; [[phab:T268181|T268181]]) (duration: 01m 06s)
* 20:18 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]]
* 22:05 urbanecm@deploy1001: Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list ([[phab:T268181|T268181]]) (duration: 01m 05s)
* 20:15 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] (duration: 07m 39s)
* 21:53 mutante: mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring [[phab:T267248|T267248]]
* 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:47 mutante: mwdebug1003 - scap pull - [[phab:T267248|T267248]]
* 20:07 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]]
* 21:40 mutante: mw1317,mw1318 - back in action and all monitoring activated again
* 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start [[phab:T316236|T316236]]
* 21:17 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler
* 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
* 18:39 dancy@deploy1002: Finished scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] (duration: 09m 53s)
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
* 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
* 21:02 mutante: mw1317,mw1318 - repooled=no after physical move to rack B
* 18:29 dancy@deploy1002: dancy and cscott: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 18:29 dancy@deploy1002: Started scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]]
* 20:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
* 20:27 mutante: mw1317, mw1318 shutting down for physical move
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1318.eqiad.wmnet
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1317.eqiad.wmnet
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 20:15 mutante: mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 ([[phab:T266164|T266164]])
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
* 20:10 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 03s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
* 20:09 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 18:15 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 20:03 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:03 akosiaris@cumin1001: conftool action : set/weight=0; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 19:53 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 18:07 godog: restart envoyproxy on thanos-fe*
* 19:48 otto@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 06s)
* 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 19:45 otto@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 07s)
* 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 19:26 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:635607 - Switch ParserCache to JSON for group0 wikis (duration: 01m 05s)
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:635086 - Enable parsoid on api_appserver (duration: 01m 04s)
* 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
* 19:19 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
* 19:13 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:641527 - Set  to 0 (duration: 01m 04s)
* 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 18:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 17:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 17:18 elukey: shutdown an-presto1004 for hw maintenance
* 15:22 hnowlan: starting cassandra on sessionstore1001-a
* 17:13 akosiaris: [[phab:T241230|T241230]] pool codfw kubernetes for recommendation-api at a very low weight
* 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 17:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
* 17:12 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
* 16:52 jbond42: drop os_version/requiers_os functions from wmflib
* 14:41 moritzm: installing libtirpc security updates
* 16:50 elukey: update /etc/krb5.keytab on krb1001/krb2001 to match the most up to date key version for host/krb2001.codfw.wmnet
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
* 16:44 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6b9784a0708cf1e7762034ccfba7e5604b2f6dc2}}: Enable the Vue version of the mentee overview in pilot wikis ([[phab:T300532|T300532]]) (duration: 03m 45s)
* 16:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
* 16:38 reedy@deploy1001: Synchronized wmf-config/logging.php: [[phab:T268141|T268141]] (duration: 01m 06s)
* 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
* 16:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
* 16:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
* 16:27 robh@cumin1001: START - Cookbook sre.dns.netbox
* 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - [[phab:T289766|T289766]]
* 15:59 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
* 15:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:16 Urbanecm: mwscript deleteEqualMessages.php --wiki=cswiki --delete
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
* 15:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
* 15:05 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: {{Gerrit|f592e85858d17a2de99cde93627054ee4972c2bd}}: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one ([[phab:T300532|T300532]]) (duration: 04m 05s)
* 15:03 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:30 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 14:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 14:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
* 14:13 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - [[phab:T289766|T289766]]
* 14:09 elukey: copied /etc/krb5.keytab from krb1001 to krb2001 (the last one contained only one principal for 2001, the first one both for 1001 and 2001)
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 14:05 moritzm: installing openldap security updates on ro replicas
* 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 14:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
* 14:02 elukey: restart krb5-kpropd.service on krb2001 to force the pick up of new client configs
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 13:35 bblack: cache_text: Executing "varnishadm -n frontend param.set nuke_limit 1000" - [[phab:T266373|T266373]]
* 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 13:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 13:30 moritzm: installing openldap security updates on corp replicas
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
* 13:08 Urbanecm: EU B&C done (~15 minutes ago)
* 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 12:43 akosiaris: sync staging cluster's helmfile.d/admin state. Aside from calico, the rest is a noop
* 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
* 12:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - [[phab:T289766|T289766]]
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|5488f56c7458fa8fb9be5f41f131e00b26a84cc0}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:25 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|45d71a37f381e81e5382c8e10ac4063c9665beb8}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
* 12:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/<nowiki>{</nowiki>bnwiki,bnwiki-1.5x,bnwiki-2x<nowiki>}</nowiki>.png ([[phab:T265553|T265553]])
* 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
* 12:13 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=releases
* 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 12:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|70aabf7ec8e1b549e78978e48967fb70d21316de}}: Regenerate Bengali Wikipedia logo ([[phab:T265553|T265553]]) (duration: 01m 06s)
* 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 12:06 akosiaris@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=wikifeeds
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
* 12:01 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
* 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after restarting mysql [[phab:T266483|T266483]] (duration: 01m 06s)
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
* 12:00 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=blubberoid,name=eqiad
* 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 11:56 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; [[phab:T246539|T246539]])
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
* 11:56 marostegui: Restart mysql on pc1009 [[phab:T266483|T266483]]
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
* 11:56 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
* 11:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 01m 18s)
* 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 11:40 XioNoX: eqiad row D: remove un-needed "enable" keywords
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99)
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
* 10:59 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
* 10:58 jbond42: renew sretest1002 ssl cert to test cookbook
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:25 godog: ms-be1022 - disable failed sdb
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
* 10:01 XioNoX: eqiad row D: Standardize interfaces descriptions
* 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
* 09:56 moritzm: uploaded libexif 0.6.21-2+deb8u4+wmf1 to jessie-wikimedia
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
* 09:22 elukey: set dns_canonicalize_hostname = false to all kerberos clients
* 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
* 09:13 jbond42: renew puppet certificate of seaborgium
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
* 08:34 marostegui: Stop MySQL on es1011, es1012, es1014 [[phab:T268100|T268100]] [[phab:T268101|T268101]] [[phab:T268102|T268102]]
* 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1012 from dbctl [[phab:T268101|T268101]]', diff saved to https://phabricator.wikimedia.org/P13326 and previous config saved to /var/cache/conftool/dbconfig/20201118-082942-marostegui.json
* 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13325 and previous config saved to /var/cache/conftool/dbconfig/20201118-082636-marostegui.json
* 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13324 and previous config saved to /var/cache/conftool/dbconfig/20201118-082618-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 80%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13323 and previous config saved to /var/cache/conftool/dbconfig/20201118-081115-root.json
* 08:49 apergos: UTC backport training window closed at lsat
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13322 and previous config saved to /var/cache/conftool/dbconfig/20201118-075612-root.json
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:45 marostegui: Deploy schema change on db1098:3316 [[phab:T267335|T267335]] [[phab:T267399|T267399]]
* 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13321 and previous config saved to /var/cache/conftool/dbconfig/20201118-074108-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
* 07:28 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 08:43 aqu: about to deploy analytics/refinery
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13320 and previous config saved to /var/cache/conftool/dbconfig/20201118-072605-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
* 07:16 marostegui: Run check table on s6 on db1125:3316 [[phab:T267090|T267090]]
* 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13319 and previous config saved to /var/cache/conftool/dbconfig/20201118-071101-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13318 and previous config saved to /var/cache/conftool/dbconfig/20201118-065558-root.json
* 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832339{{!}}Enable action blocks on ptwiki (T317157)]] (duration: 04m 07s)
* 06:53 elukey: restart also mirror maker on kafka-main1001/1003 (seems not related but just to clear old errors and a possible weird state)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 100%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13317 and previous config saved to /var/cache/conftool/dbconfig/20201118-064556-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13316 and previous config saved to /var/cache/conftool/dbconfig/20201118-064054-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json