You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s) (logmsgbot))
imported>Stashbot
(ejegg: payments-wiki upgraded from 15395d05 to 08b8c3bc (upgraded from MW 1.35 to MW 1.39))
 
Line 1: Line 1:
== 2015-07-18 ==
== 2023-01-26 ==
* 20:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s)
* 01:24 ejegg: payments-wiki upgraded from {{Gerrit|15395d05}} to {{Gerrit|08b8c3bc}} (upgraded from MW 1.35 to MW 1.39)
* 20:44 YuviPanda: restarted etherpad
* 01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
* 18:56 akosiaris: reinstall labsdb1004
* 01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
* 16:36 paravoid: Ganglia is up :)
* 01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
* 16:09 Krenair: Ganglia seems down
* 01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
* 15:42 Krenair: Doing T44180
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 05:28:25 UTC 2015 (duration 28m 24s)
* 01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 02:34 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-18 02:34:29+00:00
* 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 19s)
* 01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 02:07:38 UTC 2015 (duration 7m 37s)
* 01:00 ejegg: turned pending transaction resolvers back on after civi deploy
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-18 02:03:29+00:00
* 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
* 00:49 ejegg: restored recurring globalcollect batch size of 250
* 00:50 ejegg: civicrm upgraded from {{Gerrit|3e6b21b6}} to {{Gerrit|b5d6a790}}
* 00:09 ejegg: updated civicrm from 78de1b9b74934984af3099afe9192fa53011bdaa to 292ad137f6b3ffc818a3bd617ca4f335931091f3
* 00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 00:49 sukhe: depool cp2028 for testing firmware update cookbook: [[phab:T321309|T321309]]
* 00:49 ejegg: disabled pending transaction resolvers for civi deploy
* 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
* 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn


== 2015-07-17 ==
== 2023-01-25 ==
* 21:51 ejegg: updated civicrm from 0acac037ce0c9a64e94a475463deb2d47e84193a to 78de1b9b74934984af3099afe9192fa53011bdaa
* 23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
* 20:53 matt_flaschen: Manually fixed issue in mediawikiwiki LQT thread table with rename of Ecliptica to Entropy. https://phabricator.wikimedia.org/T106122#1461380
* 23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
* 20:03 hashar: stopping Zuul to get rid of a faulty registered function "build:Global-Dev Dashboard Data". Job is gone already.
* 23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 17:50 ejegg: updated civicrm from fa724dd2e2e69545d81015c943cb7f52cf6de8e1 to 0acac037ce0c9a64e94a475463deb2d47e84193a
* 23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 16:49 gwicke: restarted restbase on restbase1001
* 23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
* 15:04 gwicke: restarted RB thinner scripts, see https://phabricator.wikimedia.org/T105706
* 23:21 zabe@deploy1002: Started scap: (no justification provided)
* 14:10 urandom: restart restbase service on restbase1006
* 23:20 zabe@deploy1002: Backport cancelled.
* 14:07 urandom: restart restbase service on restbase1003
* 23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
* 14:05 urandom: restart restbase service on restbase1002
* 23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
* 13:56 godog: apache2ctl graceful on fluorine antimony argon caesium helium
* 23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
* 13:43 godog: apache2ctl graceful on netmon1001
* 22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 11:24 hashar: rebooted labnodepool1001.eqiad.wmnet . Accidentally deleted the whole /dev which freeze everything :(
* 22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 10:21 _joe_: repooling mw1158
* 22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
* 09:08 _joe_: depooling mw1158, repooling mw1156,7
* 22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
* 07:51 _joe_: depooled mw1156,7 for reimaging
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 04:53:56 UTC 2015 (duration 53m 55s)
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1030 (duration: 00m 12s)
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-17 02:30:03+00:00
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 05m 55s)
* 21:34 samtar@deploy1002: Finished scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] (duration: 09m 27s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 02:07:22 UTC 2015 (duration 7m 20s)
* 21:26 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] synced to the testservers: mwdebug2002.cod
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-17 02:03:12+00:00
* 21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 01:30 mutante: git pull origin on strontium
* 21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 21:24 samtar@deploy1002: Started scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]]
* 21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
* 20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 20:49 ejegg: updated employers.csv on paymentswiki
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
* 20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
* 19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
* 19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 07m 04s)
* 19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
* 19:01 brennen: 1.40.0-wmf.20 train ([[phab:T325583|T325583]]): no blockers, rolling to group1.
* 19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
* 18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
* 17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
* 17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service [[phab:T327405|T327405]]
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
* 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
* 16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
* 16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
* 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
* 15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:50 robh: db1139 ilom wins/netbios disabled and ilom reset [[phab:T327877|T327877]]
* 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
* 15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:43 robh: netbios wins disabled on db1140 ilom and ilom reset [[phab:T327877|T327877]]
* 15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:38 papaul: on going maintenance on fasw-c-eqiad
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
* 15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
* 15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for [[phab:T327824|T327824]] (duration: 07m 57s)
* 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for [[phab:T327824|T327824]]
* 15:04 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] (duration: 08m 43s)
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
* 15:01 urbanecm: Overrunning B&C window
* 14:57 urbanecm@deploy1002: urbanecm and migr: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
* 14:55 urbanecm@deploy1002: Started scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]]
* 14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 14:53 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] (duration: 32m 21s)
* 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
* 14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
* 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
* 14:21 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]]
* 14:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] (duration: 12m 59s)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
* 14:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]]
* 13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
* 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
* 13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
* 13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
* 13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
* 12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
* 12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 12:45 moritzm: restarting Exim on MXes to pick up new libtasn
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
* 12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
* 12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
* 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
* 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
* 12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 moritzm: installing libtasn security updates on buster
* 11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump ([[phab:T325942|T325942]])
* 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
* 11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
* 10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
* 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
* 09:30 Emperor: rolling depool & update of thanos front-ends [[phab:T327871|T327871]]
* 08:40 XioNoX: bump SGIX max prefix limit
* 08:13 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] (duration: 10m 13s)
* 08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:03 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]]
* 07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) [[phab:T327859|T327859]]
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 [[phab:T327859|T327859]]', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
* 07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
* 07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
* 07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
* 07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
* 07:08 AndyRussG: updated payments (config only) revision {{Gerrit|15395d05}}, config {{Gerrit|418160e9}}
* 04:10 eileen: config revision changed from {{Gerrit|dc0a0d3a}} to {{Gerrit|089d0acb}}
* 04:01 eileen: civicrm upgraded from {{Gerrit|9197ca29}} to {{Gerrit|3e6b21b6}}
* 03:27 eileen: civicrm upgraded from {{Gerrit|f6093fb2}} to {{Gerrit|9197ca29}}
* 03:05 eileen: config revision changed from {{Gerrit|3f641fce}} to {{Gerrit|dc0a0d3a}}
* 01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
* 00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye


== 2015-07-16 ==
== 2023-01-24 ==
* 21:27 ori: bounced nutcracker on mw1139 as well. hashar noticed flood of errors from these hosts on https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors . lack of monitoring / alerts is troubling.
* 23:10 zabe@deploy1002: Finished scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] (duration: 08m 02s)
* 21:26 ori: bounced nutcracker on mw1128 and mw1134
* 23:04 zabe@deploy1002: zabe: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:50 mutante: iegreview tool - short maintenance downtime
* 23:02 zabe@deploy1002: Started scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]]
* 19:39 YuviPanda: imported aspell-id from ubuntu to jessie-wikimedia - needed by ores, simple package that I am not sure why it is not in jessie
* 22:47 TheresNoTime: closing UTC late backport window
* 19:20 logmsgbot: twentyafterfour Synchronized php-1.26wmf14/includes/db/LoadMonitor.php: Deploying Hotfix for T105373 (duration: 00m 13s)
* 22:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] (duration: 09m 04s)
* 18:40 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf14
* 22:39 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:26 ejegg: changed batch size from 250 to 1 in RGC jenkins job
* 22:37 samtar@deploy1002: Started scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]]
* 18:22 ejegg: updated civicrm from 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7 to fa724dd2e2e69545d81015c943cb7f52cf6de8e1
* 22:30 samtar@deploy1002: Finished scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] (duration: 07m 59s)
* 16:56 Jeff_Green: authdns update to rename lutetium.wm.o
* 22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:08 hashar_: kept nodepool stopped on labnodepool1001.eqiad.wmnet because it spams the cron log
* 22:22 samtar@deploy1002: Started scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]]
* 15:57 logmsgbot: demon Synchronized multiversion/MWMultiVersion.php: prod no-op, beta change (duration: 00m 13s)
* 22:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] (duration: 09m 02s)
* 15:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/224975/ (duration: 00m 12s)
* 22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. [[phab:T327813|T327813]]
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf14/extensions/Math/MathMathML.php: SWAT: Fix: Undefined variable passed hook [[gerrit:225058]] (duration: 00m 12s)
* 22:13 samtar@deploy1002: samtar and stang: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:03 ejegg: updated payments from 4ca95d55a9745c05ccfbb16ee6f23a6f75328824 to ebb1a9e52172a4793cf5feb33220b4d7edfcad70
* 22:11 samtar@deploy1002: Started scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]]
* 12:21 dcausse: es1.6 upgrade: all done
* 22:08 samtar@deploy1002: Finished scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] (duration: 09m 36s)
* 11:32 dcausse: restarted gmond on elastic1024
* 22:06 TheresNoTime: extending UTC late backport window due to late start
* 11:06 mobrovac: citoid deploying ff90869
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
* 10:56 dcausse: es1.6 upgrade: upgrade elastic1031
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
* 10:25 mobrovac: citoid rolled back to ffbaf6d
* 22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
* 10:10 mobrovac: citoid deploying 5aeb0fc
* 22:00 samtar@deploy1002: samtar and jdrewniak: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:05 dcausse: es1.6 upgrade: upgrade elastic1030
* 21:59 samtar@deploy1002: Started scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]]
* 09:38 dcausse: es1.6 upgrade: upgrade elastic1029
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] (duration: 13m 31s)
* 08:42 dcausse: es1.6 upgrade: upgrade elastic1028
* 21:45 samtar@deploy1002: nray and samtar: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:31 dcausse: es1.6 upgrade: upgrade elastic1027
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
* 07:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 07:22:49 UTC 2015 (duration 22m 48s)
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 05:53 dcausse: es1.6 upgrade: upgrade elastic1026
* 21:43 samtar@deploy1002: Started scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]]
* 05:31 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 05:24 logmsgbot: krenair Synchronized php-1.26wmf14/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225008/ (duration: 00m 13s)
* 21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 04:38 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225006/ (duration: 00m 13s)
* 21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # [[phab:T275246|T275246]]
* 03:54 manybubbles: es1.6 upgrade: upgrade elastic1025
* 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 03:19 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-16 03:19:37+00:00
* 21:32 samtar@deploy1002: backport aborted:  (duration: 06m 28s)
* 03:13 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 10m 23s)
* 21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 02:46 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-16 02:46:03+00:00
* 21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 02:43 manybubbles: es1.6 upgrade: upgrade elastic1024
* 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
* 02:39 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 10m 50s)
* 21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 02:07:55 UTC 2015 (duration 7m 54s)
* 21:03 TheresNoTime: holding UTC late backport window for outage, [[phab:T327815|T327815]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-16 02:03:31+00:00
* 21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-16 02:03:30+00:00
* 20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 01:41 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/214981/ (duration: 00m 12s)
* 20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- [[phab:T325132|T325132]]
* 01:22 manybubbles: es1.6 upgrade: upgrade elastic1023
* 20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
* 20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
* 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
* 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
* 20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
* 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
* 20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
* 20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
* 20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
* 20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
* 20:16 bblack: pool cp5032
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
* 20:16 mutante: contint2001 - restarted zuul
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
* 20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
* 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
* 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
* 20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
* 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: [[phab:T326634|T326634]]
* 19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: [[phab:T326634|T326634]]
* 19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: [[phab:T326634|T326634]]
* 19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 19:39 urandom: rebooting restbase cassandra nodes, row d -- [[phab:T325132|T325132]]
* 19:33 bblack: cp5032: restart varnish-frontend
* 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
* 19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: [[phab:T326634|T326634]]
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
* 19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:55 jynus: deploy new dump grants for analytics dbs at db1108 [[phab:T327155|T327155]]
* 18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
* 18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
* 17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
* 17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
* 17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 17:19 thcipriani: restarting ci jenkins for updates
* 17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 17:04 urandom: rebooting restbase cassandra nodes, row c -- [[phab:T325132|T325132]]
* 16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
* 16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
* 15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
* 15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
* 15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
* 15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
* 14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
* 14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:25 TheresNoTime: close UTC afternoon backport window
* 14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:20 XioNoX: repool ulsfo (maintenance over)
* 14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
* 14:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] (duration: 07m 41s)
* 14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:11 samtar@deploy1002: daniel and samtar: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:09 samtar@deploy1002: Started scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]]
* 13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:44 XioNoX: reboot ulsfo switches for software upgrade
* 13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
* 13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
* 12:56 zabe@deploy1002: Finished scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] (duration: 44m 09s)
* 12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:48 XioNoX: restart ulsfo switches for network maintenance
* 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:38 zabe@deploy1002: zabe: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 12:12 zabe@deploy1002: Started scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]]
* 11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
* 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
* 11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
* 11:26 zabe@deploy1002: Finished scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] (duration: 09m 19s)
* 11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
* 11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
* 11:19 zabe@deploy1002: zabe: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:17 zabe@deploy1002: Started scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]]
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
* 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
* 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
* 10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
* 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
* 10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 10:49 XioNoX: depool ulsfo for network maintenance - [[phab:T316532|T316532]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
* 10:33 vgutierrez: repool cp4046
* 10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:31 vgutierrez: restarting varnish on cp4046
* 10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:29 vgutierrez: depool cp4046
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
* 10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
* 10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
* 10:17 effie: depooling maps from equad && pooling maps on codfw
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
* 10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - [[phab:T327754|T327754]]
* 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
* 09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
* 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
* 09:41 moritzm: installing libtasn1-6 security updates on buster
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
* 09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
* 09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
* 09:14 kart_: Done: UTC morning backport window
* 09:13 kartik@deploy1002: Finished scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] (duration: 09m 44s)
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
* 09:05 kartik@deploy1002: awight and kartik: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:03 kartik@deploy1002: Started scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
* 09:01 kartik@deploy1002: Finished scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] (duration: 10m 42s)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
* 08:52 kartik@deploy1002: awight and kartik: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:50 kartik@deploy1002: Started scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]]
* 08:48 kartik@deploy1002: Finished scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] (duration: 15m 20s)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
* 08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - [[phab:T327745|T327745]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
* 08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
* 08:33 kartik@deploy1002: Started scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
* 08:28 kartik@deploy1002: Finished scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] (duration: 09m 09s)
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
* 08:21 kartik@deploy1002: kartik and matmarex: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
* 08:19 kartik@deploy1002: Started scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]]
* 08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - [[phab:T327739|T327739]]
* 08:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] (duration: 10m 25s)
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]]
* 07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
* 07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T327616|T327616]]', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
* 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
* 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
* 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
* 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
* 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
* 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
* 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 53m 01s)
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 03:30 AndyRussG: payments-wiki upgraded from {{Gerrit|3d882ac7}} to {{Gerrit|15395d05}}
* 02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
* 02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
* 01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
* 01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
* 01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
* 01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
* 00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
* 00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
* 00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
* 00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
* 00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
* 00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
* 00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
* 00:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] (duration: 12m 47s)
* 00:03 zabe@deploy1002: zabe: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:01 zabe@deploy1002: Started scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]]


== 2015-07-15 ==
== 2023-01-23 ==
* 23:36 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s)
* 23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
* 23:22 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s)
* 23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s)
* 23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
* 23:10 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s)
* 23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
* 23:09 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s)
* 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s)
* 23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
* 22:23 csteipp: deploy patch for T105305 to wmf13/14
* 22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
* 22:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s)
* 22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 21:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s)
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 21:54 manybubbles: es1.6 upgrade: upgrade elastic1022
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:37 manybubbles: es1.6 upgrade: upgrade elastic1021
* 22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
* 21:09 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s)
* 22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
* 20:41 bblack: restarted salt-master service on palladium
* 22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
* 20:33 bblack: globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm'
* 22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
* 20:30 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s)
* 22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
* 20:20 manybubbles: es1.6 upgrade: upgrade elastic1020
* 22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
* 20:18 RoanKattouw: Running FlowCreateMentionTemplate.php on all Flow wikis
* 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
* 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14
* 22:31 maryum: Deployed patch for [[phab:T285159|T285159]]
* 19:50 ejegg: updated civicrm from e29cc5f20b5069afcaff794e628596c1f70d69a3 to 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7
* 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
* 19:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s)
* 21:40 zabe@deploy1002: Finished scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] (duration: 07m 45s)
* 19:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s)
* 21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
* 19:00 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s)
* 21:34 zabe@deploy1002: zabe: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 21:32 zabe@deploy1002: Started scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]]
* 18:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
* 18:40 ejegg: updated civicrm from f4219bc8eca5e4db633da07b6ac9e2505cfbae16 to e29cc5f20b5069afcaff794e628596c1f70d69a3
* 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
* 18:39 logmsgbot: krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s)
* 21:12 kindrobot: close UTC late backport window
* 18:39 logmsgbot: twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s)
* 21:12 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] (duration: 09m 00s)
* 18:21 manybubbles: es1.6 upgrade: upgrading elastic1019
* 21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 18:20 Jeff_Green: authdns-update shifting to service-oriented hostnames for fundraising cluster
* 21:03 kindrobot@deploy1002: Started scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]]
* 18:06 logmsgbot: twentyafterfour Started scap: group0 to 1.26wmf14
* 21:01 kindrobot: start UTC late backport window
* 17:55 ejegg: updated civicrm from 6560cefa8d7e68e35e30b310d6691ab57798a4c9 to f4219bc8eca5e4db633da07b6ac9e2505cfbae16
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 17:34 Jeff_Green: authdns-update to remove boron.wm.o
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 17:22 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s)
* 20:45 taavi: restart [[phab:T315510|T315510]] on group1 after mwmaint restart, currently running on wikidatawiki
* 17:17 Jeff_Green: authdns-update to remove aluminium, also lanthanum by preexisting commit
* 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
* 16:45 andrewbogott: rebooting labvirt1005
* 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
* 16:43 mutante: accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton
* 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
* 16:39 mutante: krypton - signing puppet cert, initial run
* 19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
* 16:26 andrewbogott: woo, first try!
* 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
* 16:23 andrewbogott: trying to kill labvirt1005 via repeated instance suspend/resume
* 19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 16:04 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 16:03 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 16:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s)
* 19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 15:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s)
* 19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 15:35 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 11s)
* 18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
* 15:29 logmsgbot: krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - [[phab:T327405|T327405]]
* 15:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load
* 15:20 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s)
* 18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
* 14:33 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 14:22 legoktm: sync failed on mw1090.eqiad.wmnet, read only filesystem
* 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 14:20 logmsgbot: legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s)
* 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
* 13:55 dcausse: es1.6 upgrade: upgrade elastic1018
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
* 13:24 springle: entry below not mw1216 fault, but r/o filesystem error on mw1090
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
* 13:15 springle: sync-common on mw1216 after sync-file from tin failed non-zero exit status 12
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
* 13:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s)
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:43 dcausse: es1.6 upgrade: upgrade elastic1017
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 08:27 dcausse: es1.6 upgrade: upgrade elastic1016
* 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 06:31 dcausse: es1.6 upgrade: upgrade elastic1015
* 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 05:40 dcausse: es1.6 upgrade: upgrade elastic1014
* 16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 05:10 springle: db1030 busy removing table partitioning
* 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 04:28 manybubbles: es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed.
* 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 04:12 manybubbles: es1.6 upgrade: upgrade elastic1013
* 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 03:49 springle: upgrade db1030 trusty
* 16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 03:29 manybubbles: es1.6 upgrade: upgrade elastic1012
* 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 03:14 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 03:10 logmsgbot: reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s)
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 03:03 manybubbles: es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 02:10 Reedy: Running LU manually to see what's wrong with it
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s)
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
* 15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: [[phab:T326634|T326634]]
* 15:50 urbanecm: Deploy security patch for [[phab:T327613|T327613]]
* 15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
* 15:44 papaul: on going maintenance on fasw-codfw
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
* 15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: [[phab:T325563|T325563]]
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
* 15:09 taavi@deploy1002: Finished scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] (duration: 07m 28s)
* 15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:02 taavi@deploy1002: Started scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]]
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
* 15:00 taavi@deploy1002: Finished scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] (duration: 07m 56s)
* 14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:53 taavi@deploy1002: taavi and sbailey: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:52 taavi@deploy1002: Started scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]]
* 14:46 taavi@deploy1002: Finished scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] (duration: 08m 48s)
* 14:42 sukhe: rolling out pybal 1.15.10: [[phab:T321191|T321191]]
* 14:39 taavi@deploy1002: taavi and func: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:37 taavi@deploy1002: Started scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]]
* 14:37 taavi@deploy1002: Finished scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] (duration: 11m 24s)
* 14:27 taavi@deploy1002: stang and taavi: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 14:25 taavi@deploy1002: Started scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]]
* 14:25 taavi@deploy1002: Finished scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] (duration: 09m 22s)
* 14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # [[phab:T326387|T326387]]
* 14:17 taavi@deploy1002: taavi and stang: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:16 taavi@deploy1002: Started scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]]
* 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
* 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
* 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
* 12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
* 12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
* 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
* 11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
* 11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
* 11:57 marostegui: Reboot db2132 (m1 codfw master)
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
* 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
* 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
* 11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - [[phab:T327644|T327644]]
* 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
* 10:55 XioNoX: update management routers ACLs to add new bast hosts
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
* 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
* 10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:07 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] (duration: 07m 51s)
* 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:01 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:59 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]]
* 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 zabe@deploy1002: Finished scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] (duration: 07m 48s)
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
* 08:39 zabe@deploy1002: zabe: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:37 zabe@deploy1002: Started scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]]
* 08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
* 08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 08:30 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] (duration: 17m 12s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
* 08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:12 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
* 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
* 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
* 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
* 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:23 kart_: Updated cxserver to 2023-01-20-051603-production ([[phab:T323840|T323840]], [[phab:T326236|T326236]])
* 06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
* 04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - [[phab:T327611|T327611]]
* 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 [[phab:T327609|T327609]]', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
* 03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - [[phab:T327609|T327609]]


== 2015-07-14 ==
== 2023-01-20 ==
* 23:46 manybubbles: es1.6 upgrade: upgraded elastic1011
* 18:22 jynus: deploying new grants for backups on m1 [[phab:T327155|T327155]]
* 23:22 bblack: updating nginx to 1.9.3-1+wmf1 on cp*
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 23:17 bblack: reprepro: nginx for jessie-wikimedia/main bumped to 1.9.3-1+wmf1
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 22:22 ejegg: updated civicrm from 04efc7d5c7bbb068f907125f2184692aee676123 to 6560cefa8d7e68e35e30b310d6691ab57798a4c9
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 21:29 Reedy: mw1090 fs is ro
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 21:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Fix testwiki
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 21:05 _joe|AFK: depooling mw1090, ext4 errors in syslog, filesystem mounted read-only
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 21:01 logmsgbot: twentyafterfour Synchronized wmf-config/CommonSettings.php: revert LCStoreStaticArray (duration: 00m 12s)
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 20:59 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf14 and rebuild localization cache (duration: 72m 45s)
* 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 20:42 bblack: undoing LCStoreStaticArray because appservers look unhealthy, using ori's command: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"'
* 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:46 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf14 and rebuild localization cache
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:23 manybubbles: es1.6 step iforget: upgrade elasticsearch on elastic1010
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:41 mutante: terbium:   /usr/local/bin/foreachwiki extensions/Echo/maintenance/processEchoEmailBatch.php
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 17:10 dcausse: es1.6 step 10: upgrade elastic1009
* 13:08 moritzm: installing node-minimatch security updates
* 16:23 mutante: bromine - apt-get upgrade
* 13:01 moritzm: installing libxstream-java security updates
* 15:08 logmsgbot: manybubbles Synchronized php-1.26wmf13/extensions/UniversalLanguageSelector/: SWAT add some hooks to extension.json (duration: 00m 13s)
* 13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: [[phab:T325557|T325557]]
* 14:34 gwicke: started RESTBase revision thin-out script for html and data-parsoid on wikimedia domains
* 12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 14:01 dcausse: es1.6 step 9: upgrade elastic1008
* 12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
* 12:48 _joe_: reimaging mw1155
* 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 12:17 ori: Logging a message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log.
* 12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 11:28 dcausse: es1.6 step 8: upgrade elastic1007
* 12:17 moritzm: installing ping1003 [[phab:T273509|T273509]]
* 11:25 _joe_: repooling mw1154 with HHVM
* 12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
* 10:12 _joe_: stopped poolcounter on mw1154
* 12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 10:06 _joe_: reimaging mw1154
* 12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 07:49 dcausse: es1.6 step 7: upgrade elastic1006
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 07:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 07:09:10 UTC 2015 (duration 9m 9s)
* 10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 06:48 dcausse: es1.6 step 6: upgrade elastic1005
* 10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
* 06:41 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9c9bf0f4: Use LCStoreStaticArray unconditionally (duration: 03m 02s)
* 10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 05:26 ori: Cleaned up now-unused hhbc files from /run/hhvm/cache on job runners
* 10:13 moritzm: installing emacs security updates on bullseye
* 04:58 ori: Enabling LCStoreStaticArray in production. May be reverted by running: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"' on palladium.
* 10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 04:48 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Follow-up for Ieb62ee050e: allow LCStoreStaticArray in server mode (duration: 00m 13s)
* 10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci [[phab:T326531|T326531]]
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00
* 10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s)
* 10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s)
* 08:59 moritzm: installing ping2003 [[phab:T273509|T273509]]
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00
* 08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
* 01:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
* 01:55 ejegg: payments-wiki upgraded from {{Gerrit|3cf03933}} to {{Gerrit|3d882ac7}}
* 01:12 ejegg: payments-wiki upgraded from {{Gerrit|fcb9ab60}} to {{Gerrit|3cf03933}}


== 2015-07-13 ==
== 2023-01-19 ==
* 23:22 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/VisualEditor: SWAT (duration: 00m 11s)
* 21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
* 23:11 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Add title to Parsoid exception logging (duration: 00m 12s)
* 21:42 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] (duration: 10m 38s)
* 22:45 logmsgbot: legoktm Synchronized wmf-config: Revert "Set $wgCentralAuthStrict = true;" (duration: 00m 13s)
* 21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 22:41 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 13s)
* 21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
* 22:41 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 21:31 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]]
* 22:16 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/User.php: Add 'AuthPluginStrict' log to identify users who are unable to authenticate (duration: 00m 13s)
* 21:27 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]] (duration: 08m 26s)
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 12s)
* 21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/Hooks.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 13s)
* 21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
* 22:13 ejegg: updated payments from ec34ebf61e5962f66b807abdcb519ff323d41e8e to 4ca95d55a9745c05ccfbb16ee6f23a6f75328824
* 21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
* 22:00 manybubbles: es1.6 step 4: upgrade elastic1003
* 21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:54 ori: Debugging metric issue on graphite1001, brief stats drop possible
* 21:18 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]]
* 21:32 legoktm: renaming ~3k users who were originally missed for SULF
* 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/Hooks.php: (no message) (duration: 00m 12s)
* 20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: (no message) (duration: 00m 13s)
* 20:05 zabe@deploy1002: Started scap: fix k8s drift
* 20:42 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: f9c89d2814: Revert "Revert Count API module instantiations and Hook runs" (duration: 00m 13s)
* 20:02 zabe@deploy1002: Finished scap: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]] (duration: 14m 01s)
* 20:30 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ieb62ee05: Temporary hack to facilitate migration of l10n cache implementations (duration: 00m 11s)
* 19:49 zabe@deploy1002: zabe: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:42 hoo: Updated Wikidata's property suggester with data from today's json dump
* 19:48 zabe@deploy1002: Started scap: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]]
* 19:24 manybubbles_: es1.6 step 3: upgrade elastic1002
* 18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # [[phab:T233004|T233004]]
* 19:08 legoktm: running populateContentModel.php --table=page on all small wikis
* 18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 19:01 andrewbogott: two of two
* 18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 19:01 mutante: morebots - are you 1.7.11 ?
* 18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 19:01 andrewbogott: one of two
* 18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 18:52 legoktm: running populateContentModel.php --table=page on testwiki
* 18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 18:29 manybubbles_: es1.6 step 2: shut down extra instance of elasticsearch on elastic1021
* 18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:39 andrewbogott: this is the second test log of three
* 18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:39 andrewbogott: this is the first test log of three
* 18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:36 mutante: included adminbot_1.7.11 in APT repo
* 18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 16:31 andrewbogott: wikidata-dev updated local puppet and rebooting property-suggester
* 18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 16:08 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 16:07 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 15:11 manybubbles_: all done SWATing.
* 17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
* 15:09 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable footer contact link on ukwiki (duration: 00m 11s)
* 17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
* 14:55 manybubbles_: after upgrading elasticsearch its init script no longer shuts down the old version of elasticsearch. so you have to manually kill it. that means the upgrade instructions will be "special" this time around. hopefully this is a one time thing.
* 17:13 zabe@deploy1002: Finished scap: [[phab:T233004|T233004]] (duration: 18m 50s)
* 14:45 manybubbles_: es1.6 step 1: upgrade elasticsearch on elastic1001 -starting
* 17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
* 14:45 manybubbles_: es1.6 step 0: successfully synced new versions of plugins
* 16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
* 14:30 manybubbles_: es1.6 step 0: sync new versions of plugins
* 16:54 zabe@deploy1002: Started scap: [[phab:T233004|T233004]]
* 14:30 manybubbles_: starting the elasticsearch 1.6.0 upgrade
* 16:54 zabe@deploy1002: backport aborted: (duration: 15m 22s)
* 13:13 bblack: updating nginx/bind on cp*
* 16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - [[phab:T327161|T327161]]
* 13:07 bblack: updating openssl on cp*
* 16:44 zabe@deploy1002: Started scap: Backport for [[gerrit:881609{{!}}Add ability to start from cuc_id to populateCucComment (T233004)]]
* 13:02 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/Cite/extension.json: https://gerrit.wikimedia.org/r/#/c/224407/ - unbreak VE mobile, https://phabricator.wikimedia.org/T105686 (duration: 00m 12s)
* 16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
* 10:58 mobrovac: restbase deploying 6dec79d
* 16:27 moritzm: installing cryptsetup updates for bullseye
* 10:22 logmsgbot: ori Synchronized php-1.26wmf13/maintenance/rebuildLocalisationCache.php: 117f60a171: rebuildLocalisationCache: don't limit memory usage (duration: 00m 12s)
* 16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
* 08:52 godog: bounce graphite-web on graphite1001
* 16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
* 08:51 godog: bounce carbon daemons on graphite1001
* 16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 08:50 godog: upgrade graphite to 0.9.13 on graphite1001 and bounce one instance of carbon/cache
* 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 07:29 logmsgbot: ori Synchronized php-1.26wmf13/includes/cache/LCStoreStaticArray.php: I3f63594a4: Fix variable name (follows Ib2c5856d) (duration: 00m 11s)
* 16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
* 06:25 logmsgbot: LocalisationUpdate failed: git pull of core failed
* 16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 06:24 ori: Experimenting with altering the localisation cache implementation for testwiki, operations/mediawiki-config on tin will have a local hack for a little bit
* 15:55 sukhe: update pybal to 1.15.10 on lvs4010: [[phab:T321191|T321191]]
* 05:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 05:07:32 UTC 2015 (duration 7m 31s)
* 15:45 effie: enable puppet on C:memcached hosts
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 02:25:58 UTC 2015 (duration 25m 57s)
* 15:42 godog: bounce opensearch on logstash102[34] - [[phab:T327161|T327161]]
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:23:43+00:00
* 15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: [[phab:T321191|T321191]]
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 16s)
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:10:25+00:00
* 15:17 effie: disable puppet on all C:memcached servers to deploy 812173
* 02:10 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
* 01:47 springle: restarted labsdb1002 mysqld while troubleshooting replication
* 14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
* 14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
* 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 14:32 zabe: run populateCulComment on group2 wikis # [[phab:T327290|T327290]]
* 14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
* 12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
* 12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 [[phab:T323820|T323820]]
* 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 11:29 hnowlan: rebooting maps-codfw for updates
* 11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
* 11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
* 11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
* 11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
* 11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
* 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:44 hnowlan: rebooting maps-eqiad for updates
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
* 10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
* 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
* 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
* 09:55 moritzm: installing ping3003 [[phab:T273509|T273509]]
* 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
* 09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
* 09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 08:26 moritzm: installing sudo security updates
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
* 06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - [[phab:T327372|T327372]]
* 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
* 05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T327372|T327372]]
* 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T327372|T327372]]


== 2015-07-12 ==
== 2023-01-18 ==
* 14:59 bblack: upgraded most packages on sodium
* 23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # [[phab:T327290|T327290]]
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 23:42 cstone: civicrm upgraded from {{Gerrit|164270b0}} to {{Gerrit|f6093fb2}}
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G  - bking@cumin1001 - [[phab:T323646|T323646]]
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G  - bking@cumin1001 - [[phab:T323646|T323646]]
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 21:50 kindrobot: close UTC late backport window
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 21:50 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] (duration: 10m 45s)
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 21:39 kindrobot@deploy1002: Started scap: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]]
* 21:36 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]] (duration: 13m 01s)
* 21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:23 kindrobot@deploy1002: Started scap: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]]
* 21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
* 21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
* 21:03 kindrobot: start UTC late backport window
* 20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
* 20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
* 20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
* 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
* 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
* 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
* 18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]] (duration: 09m 38s)
* 18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]]
* 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
* 17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
* 17:42 jnuche@deploy1002: install-world aborted: (duration: 07m 17s)
* 17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
* 17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
* 17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
* 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
* 17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
* 17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
* 17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
* 16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
* 16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
* 16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
* 16:39 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]] (duration: 09m 27s)
* 16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]]
* 16:20 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 24s)
* 16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:11 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 15:58 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] (duration: 08m 52s)
* 15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:49 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]]
* 15:44 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 06s)
* 15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
* 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:35 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- [[phab:T327001|T327001]]
* 15:29 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 11m 30s)
* 15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
* 15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
* 15:17 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]] (duration: 09m 11s)
* 15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
* 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
* 15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]]
* 15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]] (duration: 13m 04s)
* 15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
* 14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
* 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]]
* 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]] (duration: 10m 33s)
* 14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]]
* 14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]] (duration: 19m 54s)
* 14:23 moritzm: installing mod-wsgi security updates
* 14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]]
* 13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
* 13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
* 12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
* 11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
* 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
* 11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
* 11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
* 11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:16 volans@cumin1001: START - Cookbook sre.network.cf
* 11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:15 volans@cumin1001: START - Cookbook sre.network.cf
* 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
* 11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:11 volans@cumin2002: START - Cookbook sre.network.cf
* 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:10 volans@cumin1001: START - Cookbook sre.network.cf
* 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:10 volans@cumin1001: START - Cookbook sre.network.cf
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
* 10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:59 volans@cumin1001: START - Cookbook sre.network.cf
* 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
* 10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
* 10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
* 10:21 zabe@deploy1002: Finished scap: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]] (duration: 09m 17s)
* 10:14 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 10:12 zabe@deploy1002: Started scap: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]]
* 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:49 godog: start migration from webperf1004 to arclamp1001 - [[phab:T319434|T319434]]
* 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
* 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
* 09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
* 09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]] (duration: 08m 20s)
* 09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
* 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
* 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
* 01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
* 00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # [[phab:T327118|T327118]]
* 00:26 zabe@deploy1002: Finished scap: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]] (duration: 08m 29s)
* 00:20 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 00:18 zabe@deploy1002: Started scap: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]]
* 00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # [[phab:T312153|T312153]]
* 00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # [[phab:T312153|T312153]]


== 2015-07-11 ==
== 2023-01-17 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request [[:phab:T327149{{!}}T327149]]" # [[phab:T327149|T327149]]
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 23:33 zabe@deploy1002: Finished scap: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]] (duration: 09m 58s)
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 23:25 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 23:24 zabe@deploy1002: Started scap: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]]
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 23:19 zabe@deploy1002: Finished scap: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]] (duration: 11m 46s)
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 23:09 zabe@deploy1002: zabe and dreamyjazz and zabe: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 04:12 bd808: rebooting logstash1006
* 23:07 zabe@deploy1002: Started scap: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]]
* 04:06 bd808: logstash1005 fully recovered all shards
* 23:06 zabe@deploy1002: Finished scap: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]] (duration: 10m 29s)
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 22:57 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 22:55 zabe@deploy1002: Started scap: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]]
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 22:51 bblack: repooling codfw
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 22:48 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]] (duration: 10m 34s)
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 22:38 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]]
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 22:30 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=non-existent1001
* 00:34 bd808: rebooting logstash1005
* 22:27 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]] (duration: 09m 42s)
* 00:30 bd808: logstash1004 fully recovered all shards
* 22:25 bblack: cp2031: restart ats-be
* 22:20 ebernhardson@deploy1002: ebernhardson and ebernhardson: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:18 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]]
* 22:14 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]] (duration: 10m 43s)
* 22:05 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:04 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]]
* 21:54 ebernhardson: Finished scap: Backport for [[gerrit:880913{{!}}Table of contents Collapse/Expand not working (T327064)]]
* 21:54 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] (duration: 09m 20s)
* 21:52 zabe: zabe@mwmaint1002:~$ mwscript extensions/CheckUser/maintenance/populateCulComment.php --wiki testwiki
* 21:46 ebernhardson@deploy1002: ebernhardson and trainbranchbot: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:44 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]]
* 21:42 ebernhardson@deploy1002: Sync cancelled.
* 21:35 ebernhardson@deploy1002: ebernhardson and dreamyjazz: Backport for [[gerrit:879653{{!}}Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:34 ebernhardson: scap also backporting [[gerrit:880913{{!}}Table of contents Collapse/Expand not working (T327064)]]
* 21:34 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:879653{{!}}Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]]
* 21:29 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]] (duration: 12m 21s)
* 21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 21:17 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]]
* 21:00 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (had been left depooled from previous powercycle)
* 20:47 ryankemper: [WDQS] Depooled `wdqs1016`
* 20:25 herron: ran preferred-replica-election on kafka-logging codfw to clear replica imbalance
* 20:18 ryankemper: [WDQS] Restart blazegraph on `wdqs1016` to clear alert: `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
* 20:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 20:04 eileen: config revision changed from {{Gerrit|2e5cee3c}} to {{Gerrit|7425df0b}}
* 19:50 ryankemper: [[phab:T327175|T327175]] Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
* 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 19:41 zabe@deploy1002: Finished scap: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]] (duration: 10m 25s)
* 19:32 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 19:30 zabe@deploy1002: Started scap: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]]
* 18:48 zabe@deploy1002: Finished scap: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]] (duration: 09m 08s)
* 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 18:41 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:39 zabe@deploy1002: Started scap: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]]
* 18:39 zabe@deploy1002: backport aborted:  (duration: 00m 26s)
* 18:35 zabe@deploy1002: backport aborted:  (duration: 19m 41s)
* 18:29 otto@deploy1002: Finished deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac] (duration: 04m 28s)
* 18:29 otto@deploy1002: Finished deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919] (duration: 00m 15s)
* 18:29 otto@deploy1002: Started deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919]
* 18:25 otto@deploy1002: Started deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac]
* {{safesubst:SAL entry|1=18:25 zabe@deploy1002: zabe and matmarex and zabe: Backport for [[gerrit:880908{{!}}objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158)]], [[gerrit:878169{{!}}Use new DiscussionTools heading markup on enwiki (T314714)]], [[gerrit:879158{{!}}Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955)]], [[gerrit:879159{{!}}Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907)]], [[}}
* {{safesubst:SAL entry|1=18:23 zabe@deploy1002: Started scap: Backport for [[gerrit:880908{{!}}objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158)]], [[gerrit:878169{{!}}Use new DiscussionTools heading markup on enwiki (T314714)]], [[gerrit:879158{{!}}Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955)]], [[gerrit:879159{{!}}Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907)]], [[gerrit:879103{{!}}}}
* 18:13 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 18:10 mutante: gerrit1002/gerrit2002: sudo rmdir /srv/gerrit/jvmlogs
* 18:07 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 18:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 18:05 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 18:01 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
* 17:58 jynus: restarted es5 codfw backup
* 17:54 bblack: authdns1001: restart confd
* 17:27 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=aqs,name=codfw
* 17:19 effie: pooling back codfw services
* 17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
* 17:01 effie: restarting confd on deploy1002
* 16:59 effie: pooling back depooled mw servers in codfw
* 16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
* 16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
* 16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: [[phab:T325557|T325557]]
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
* 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- [[phab:T327001|T327001]]
* 14:52 urandom: disabling Cassandra hinted-handoff for codfw  -- [[phab:T327001|T327001]]
* 14:27 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 14:26 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 14:12 _joe_: try to restart cassandra-a on aqs2005
* 13:37 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
* 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw
* 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw
* 13:27 jynus: restarting manually replication on es2020, may require data check afterwards
* 13:26 _joe_: depooling all services in codfw
* 13:19 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance
* 13:15 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 13:14 oblivian@cumin1001: START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance
* 13:13 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance
* 13:13 oblivian@cumin1001: START - Cookbook sre.discovery.service-route check citoid: maintenance
* 13:08 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
* 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.*
* 12:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 12:35 moritzm: installing ipython security updates
* 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
* 11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
* 11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
* 11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
* 11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
* 10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
* 10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 10:11 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]] (duration: 42m 26s)
* 09:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s)
* 09:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided)
* 09:28 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs [[phab:T325582|T325582]]
* 09:26 jnuche@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s)
* 09:26 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:47 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] (duration: 13m 50s)
* 08:35 ladsgroup@deploy1002: ladsgroup and dreamyjazz: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 08:33 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]]
* 08:29 kartik@deploy1002: Finished scap: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] (duration: 20m 56s)
* 08:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # [[phab:T327146|T327146]]
* 08:13 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:08 kartik@deploy1002: Started scap: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]]
* 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
* 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
* 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
* 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
* 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
* 07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T326134|T326134]]
* 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
* 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T326134|T326134]]
* 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T326134|T326134]]


== 2015-07-10 ==
== 2023-01-16 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 19:43 bd808: rebooting logstash1004
* 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - [[phab:T304712|T304712]]
* 17:26 moritzm: installed python security updates on mc*
* 13:34 XioNoX: repool eqiad-eqord link - [[phab:T304712|T304712]]
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 12:50 XioNoX: drain eqiad-eqord link - [[phab:T304712|T304712]]
* 15:25 godog: bounce cassandra on restbae1004
* 12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 13:43 godog: bounce cassandra on restbae1004
* 12:43 Amir1: power cycled db1198
* 13:37 _joe_: temporarily repooled mw1031
* 12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 12:40 godog: bounce cassandra on restbae1004
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
* 07:43 godog: reimage ms-be2013 T105213
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 02:59 elee: please log this with the year
* 11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 02:53 andrewbogott: testing the log by logging a test
* 11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 01:50 gwicke: bounced cassandra on restbase1004
* 11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 01:38 jgage: cassandra restarted on restbase1004
* 11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 00:39 urandom: starting restbase1004
* 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
* 10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:48 moritzm: installing libtasn1-6 security updates on Bullseye
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
* 08:14 oblivian@deploy1002: Synchronized README: test null deployment for [[phab:T327041|T327041]] (duration: 07m 12s)
* 08:09 Emperor: stopped swift_rclone_sync on ms-be1069
* 07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]{{!}}10).codfw.wmnet
* 07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]{{!}}3[0-4]).codfw.wmnet
* 07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59{{!}}6[0-9]{{!}}70).codfw.wmnet
* 07:26 _joe_: restarting pybal on lvs2009
* 07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*{{!}}appservers{{!}}api)-ro,name=codfw
* 07:10 _joe_: depooling mediawiki in codfw
* 06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
* 02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
* 02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
* 02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
* 01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
* 01:35 Amir1: rolling restart of php-fpm across the fleet
* 01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
* 01:29 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] (duration: 24m 47s)
* 01:15 thcipriani@deploy1002: thcipriani and func: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 01:05 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]]


== July 9 ==
== 2023-01-14 ==
* 23:41 legoktm: deployed patch for T105413
* 09:46 godog: issue 'request system reboot member 2' - [[phab:T327001|T327001]]
* 23:07 gwicke: bounced cassandra on restbase1004
* 09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 09:19 Emperor: depool thanos-fe2002 [[phab:T327001|T327001]]
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 09:19 Emperor: depool ms-fe2010 [[phab:T327001|T327001]]
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 19:04 gwicke: bounced cassandra on restbase1004
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 17:54 gwicke: bounced restbase on restbase1005
* 17:32 ori: installed poolcounter on mw1154
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 16:53 gwicke: bounced cassandra on restbase1004
* 16:48 godog: reboot ms-be2013 T105213
* 16:38 gwicke: bounced cassandra on restbase1006
* 16:07 _joe_: repooling mw1152
* 15:57 godog: restart cassandra on restbase1002
* 15:34 gwicke: bounced cassandra on restbase1004
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 15:09 gwicke: bounced cassandra on restbase1004
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 14:24 godog: really upgrade python-django on graphite2001
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 11:34 godog: restart cassandra on restbase1001
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 02:28 twentyafterfour: restarted phd
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 02:00 springle: pkg upgrade and restart db1037
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245


== July 8 ==
== 2023-01-13 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
* 22:45 mutante: zirconium - stop puppet for role switch
* 20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
* 20:56 csteipp: deployed patches for T103022 & T103023
* 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
* 20:29 subbu: deployed parsoid sha c4cfc527
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 20:15 gwicke: bounced cassandra on restbase1001
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
* 19:32 gwicke: stopped cassandra on restbase1008
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:26 urandom: restbase rolling restart
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 moritzm: installed libwmf security updates on various systems
* 19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 17:09 gwicke: bounced cassandra on restbase1004
* 19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 09:42 ori: morebots, are you OK?
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 09:41 godog: bounce nutcracker on silver
* 18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # [[phab:T298707|T298707]]
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 17:34 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] (duration: 13m 25s)
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 17:22 thcipriani@deploy1002: thcipriani and abi: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 09:06 hashar: Jenkins registering jobs with Zuul
* 17:20 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]]
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
* 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
* 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
* 12:48 moritzm: installing bast6002 [[phab:T324974|T324974]]
* 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
* 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
* 10:53 moritzm: installing bast5003 [[phab:T324974|T324974]]
* 10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 09:41 moritzm: installing bast4004 [[phab:T324974|T324974]]
* 09:06 moritzm: installing bast3006 [[phab:T324974|T324974]]
* 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
* 01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
* 01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
* 01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
* 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
* 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
* 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
* 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
* 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
* 01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
* 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
* 01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
* 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
* 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
* 00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
* 00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
* 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED


== July 7 ==
== 2023-01-12 ==
* 23:54 jgage: kafka brokers 1018 & 1021 were demoted; i have triggered a leader election and they are leaders again
* 23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # [[phab:T233004|T233004]]
* 23:05 logmsgbot: catrope Synchronized visualeditor-default.dblist: Enable VE by default on labswiki (duration: 00m 12s)
* 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:56 hoo: Restarted hhvm on mw1003 "Fatal error: Function already defined: wmfLoadInitialiseSettings in /srv/mediawiki/wmf-config/CommonSettings.php on line 187"
* 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:16 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/resourceloader/ResourceLoader.php: T104769 (duration: 00m 13s)
* 23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
* 20:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf13
* 23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
* 20:00 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf13 and rebuild l10n cache (duration: 39m 41s)
* 23:08 sbassett: Deployed (temporary) security mitigations for [[phab:T326691|T326691]]
* 19:47 gwicke: restarted cassandra on restbase1005
* 22:45 mutante: people2002 - apt-get remove --purge rsync
* 19:20 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf13 and rebuild l10n cache
* 22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # [[phab:T233004|T233004]]
* 19:15 moritzm: installed PHP security updates on all trusty hosts
* 22:07 thcipriani: end UTC late backport
* 18:58 ejegg: updated payments from a17ee221db0dbde70c92e24fc188379b6dbad613 to ec34ebf61e5962f66b807abdcb519ff323d41e8e
* 22:06 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]] (duration: 09m 23s)
* 18:08 twentyafterfour: restarted apache2 on iridium (phab hotfix)
* 21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
* 17:10 robh: OTRS update appears to be functioning normally.  As such, ending maintenance window.
* 21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
* 17:06 robh: otrs is now using the new sha256 cert
* 21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref [[phab:T326668|T326668]]
* 17:00 robh: starting otrs maint window
* 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
* 16:58 _joe_: restarted HHVM on mw1026, near to OOM
* 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
* 16:47 twentyafterfour: applied hotfix for phabricator bug: https://secure.phabricator.com/D13544
* 21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:36 mutante: protactinium - manual iptables rules replaced by puppet/ferm rules
* 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
* 16:11 logmsgbot: thcipriani Synchronized php-1.26wmf12/extensions/ContentTranslation/extension.json: Remove default value for ContentTranslationCampaigns (duration: 00m 12s)
* 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
* 15:33 jynus: manually editing table mediawiki.ipblocks to fully solve a former software bug
* 21:57 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]]
* 15:12 Jeff_Green: ptr records for frack/codfw and authdns-update
* 21:56 zabe: run populateCucComment.php on testwiki # [[phab:T233004|T233004]]
* 15:10 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable ContentTranslation in enwiki [[gerrit:222991]] (duration: 00m 13s)
* 21:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]] (duration: 09m 04s)
* 14:21 jynus: dropping optin_survey_old table from enwiki
* 21:41 thcipriani@deploy1002: thcipriani and stang: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:23 akosiaris: restarting gitblit on antimony
* 21:39 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]]
* 11:31 mobrovac: restbase restarted cassandra on rb1005
* 21:37 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] (duration: 09m 10s)
* 11:26 godog: restart cassandra on restbase1004, heap exhausted
* 21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:49 godog: restarted cassandra on restbase1005, mutations through the roof
* 21:28 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]]
* 08:27 godog: set operations/puppet/cassandra git submodule repo as hidden
* 21:27 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]] (duration: 09m 25s)
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul  7 06:11:46 UTC 2015 (duration 11m 45s)
* 21:21 ejegg: restarted fundraising scheduled jobs
* 05:51 logmsgbot: krinkle Synchronized php-1.26wmf12/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.js: I3e965dda1c4 (duration: 00m 12s)
* 21:19 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-07 02:27:55+00:00
* 21:19 thcipriani@deploy1002: thcipriani and stang: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 09s)
* 21:17 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]]
* 01:12 ori: Re-pooled mw1152 at 20:46 UTC, did not log it then.
* 21:16 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] (duration: 10m 43s)
* 00:41 springle: upgrade db1041 trusty
* 21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 00:37 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/CentralAuth/includes/CreateLocalAccountJob.php: https://gerrit.wikimedia.org/r/#/c/223211/ (duration: 00m 13s)
* 21:05 thcipriani@deploy1002: Started scap: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]]
* 20:43 ejegg: rolled back CiviCRM to {{Gerrit|9afd2789}}
* 20:31 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
* 20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
* 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
* 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
* 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18  refs [[phab:T325581|T325581]]
* 18:36 mutante: stat1008 - systemctl reset-failed  - clears Icinga alerts from failed things of the past
* 18:35 mutante: stat1007 - systemctl reset-failed  - clears Icinga alerts
* 18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
* 18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
* 17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 17:45 mutante: powercycling mc2040 via mgmt ocnsole
* 17:34 ejegg: civicrm rolled back from {{Gerrit|7ecb5038}} to {{Gerrit|9afd2789}}
* 17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 17:08 btullis@cumin1001: Added views for new wiki: aswikiquote [[phab:T321294|T321294]]
* 17:05 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 16:31 zabe@deploy1002: Finished scap: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]] (duration: 09m 49s)
* 16:23 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:21 zabe@deploy1002: Started scap: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]]
* 16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 16:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 16:08 btullis@cumin1001: Added views for new wiki: bjnwiktionary [[phab:T312214|T312214]]
* 15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 15:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 15:44 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 15:36 btullis@cumin1001: Added views for new wiki: shnwikibooks [[phab:T321256|T321256]]
* 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 effie: Planet import in codfw (on maps2009) started at 15:26 UTC - [[phab:T314472|T314472]]
* 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
* 15:11 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
* 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
* 15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
* 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
* 14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
* 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
* 14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote [[phab:T321288|T321288]]
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
* 14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
* 14:37 moritzm: installing sqlite3 security updates on buster
* 14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
* 14:34 taavi: UTC afternoon backports done
* 14:28 taavi@deploy1002: Finished scap: Backport for [[gerrit:879101{{!}}Track callers of parseRevisionParsoidHtml.]] (duration: 09m 34s)
* 14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
* 14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
* 14:20 taavi@deploy1002: taavi and matmarex: Backport for [[gerrit:879101{{!}}Track callers of parseRevisionParsoidHtml.]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
* 14:18 taavi@deploy1002: Started scap: Backport for [[gerrit:879101{{!}}Track callers of parseRevisionParsoidHtml.]]
* 14:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
* 14:17 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 14:16 taavi@deploy1002: Finished scap: Backport for [[gerrit:871272{{!}}Allow administrators to revoke autopatroller rights on sh.WP (T325938)]] (duration: 13m 30s)
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
* 14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
* 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
* 14:05 taavi@deploy1002: taavi and aleksandar: Backport for [[gerrit:871272{{!}}Allow administrators to revoke autopatroller rights on sh.WP (T325938)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:03 taavi@deploy1002: Started scap: Backport for [[gerrit:871272{{!}}Allow administrators to revoke autopatroller rights on sh.WP (T325938)]]
* 13:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
* 13:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 13:28 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:879277{{!}}Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.]] (duration: 21m 44s)
* 13:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
* 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:08 ladsgroup@deploy1002: ladsgroup and daniel: Backport for [[gerrit:879277{{!}}Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:06 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:879277{{!}}Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.]]
* 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary [[phab:T326138|T326138]]
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
* 12:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 12:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
* 12:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 12:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
* 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
* 12:13 XioNoX: repool esams
* 12:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 12:08 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 12:08 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 12:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
* 11:54 XioNoX: re-seating cr2-esams fpc0 linecard  - [[phab:T318783|T318783]]
* 11:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
* 11:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 11:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 11:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 11:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
* 11:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:879412{{!}}throttle: Add new rule for cswiki course (T326792)]] (duration: 07m 47s)
* 11:17 urbanecm@deploy1002: Started scap: Backport for [[gerrit:879412{{!}}throttle: Add new rule for cswiki course (T326792)]]
* 11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
* 11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
* 11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
* 11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
* 11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
* 11:11 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # [[phab:T298707|T298707]]
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
* 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
* 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
* 10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
* 10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
* 10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
* 10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
* 10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
* 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
* 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
* 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
* 10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
* 10:24 XioNoX: rollback redirect ns2 to authdns1001 - [[phab:T316532|T316532]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
* 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
* 10:01 XioNoX: reboot asw2-esams for upgrade - [[phab:T316532|T316532]]
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
* 09:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
* 09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
* 09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
* 09:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
* 09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
* 09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 09:47 btullis@cumin1001: Added views for new wiki: pcmwiki [[phab:T310879|T310879]]
* 09:46 XioNoX: redirect ns2 to authdns1001 - [[phab:T316532|T316532]]
* 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
* 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
* 09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
* 09:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
* 09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
* 09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
* 09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
* 09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
* 09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
* 09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
* 09:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
* 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
* 09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
* 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
* 08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
* 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
* 08:55 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
* 08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
* 08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
* 08:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
* 08:53 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
* 08:50 XioNoX: depool esams for network maintenance - [[phab:T316532|T316532]]
* 08:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
* 08:49 zabe: deployed updated patch for [[phab:T311337|T311337]]
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
* 08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
* 08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
* 07:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
* 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
* 07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
* 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
* 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
* 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
* 07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
* 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
* 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
* 05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
* 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
* 04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
* 03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
* 03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
* 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
* 03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
* 03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
* 03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
* 02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
* 02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
* 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
* 01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
* 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
* 01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
* 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
* 01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
* 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
* 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
* 00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
* 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
* 00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
* 00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json


== July 6 ==
== 2023-01-11 ==
* 23:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 12s)
* 23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
* 23:49 logmsgbot: krenair Synchronized w/static/images/project-logos/mrwikisource.png: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 13s)
* 23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
* 23:35 logmsgbot: krenair Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/223179/ - should be labs-only (duration: 00m 12s)
* 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
* 23:32 logmsgbot: krenair Synchronized README: https://gerrit.wikimedia.org/r/#/c/222941/ - ... (duration: 00m 13s)
* 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 23:27 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221809/ - should be a noop, just doc changes (duration: 00m 13s)
* 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 23:25 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221808/ (duration: 00m 13s)
* 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/223185/ (duration: 00m 12s)
* 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220970/ (duration: 00m 14s)
* 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
* 21:46 gwicke: restarted cassandra instance on restbase1003; was low on memory and constantly writing small chunks
* 23:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.18  refs [[phab:T325581|T325581]] (duration: 06m 57s)
* 21:30 andrewbogott: rebooting labvirt1005, again. Somehow virtualization is turned off again
* 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
* 21:12 subbu: deployed parsoid version 87a746e6
* 23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18  refs [[phab:T325581|T325581]]
* 21:04 logmsgbot: ori Synchronized php-1.26wmf12/thumb.php: cdc75debaf: Add Content-Length header to thumb.php error responses (duration: 00m 13s)
* 23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
* 21:02 mutante: purging static-bz URL on varnish ...
* 22:51 zabe@deploy1002: Finished scap: Backport for [[gerrit:879055{{!}}Start reading from cuc_actor on group0 and group1 wikis (T233004)]], [[gerrit:879148{{!}}Start writing to rev_comment_id on group0 wikis (T299954)]], [[gerrit:879057{{!}}Stop writing to cul_user and cul_user_text on testwiki (T233004)]] (duration: 09m 28s)
* 20:39 akosiaris: upload php5_5.3.10-1ubuntu3.19-wmf1 on apt.wikimedia.org/precise-wikimedia
* 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
* 20:15 gwicke: restart cassandra instance on 1005
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
* 20:04 mobrovac: restbase restart cassandra on rb1005
* 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 19:28 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/223040/ (duration: 00m 12s)
* 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 19:11 gwicke: reduced compaction throughput from 160 to 100 mb/s across the cassandra cluster via 'nodetool -h <host> setcompactionthroughput 100'
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
* 18:51 gwicke: restarted cassandra on restbase1001 with jdk8, see T104888
* 22:44 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:879055{{!}}Start reading from cuc_actor on group0 and group1 wikis (T233004)]], [[gerrit:879148{{!}}Start writing to rev_comment_id on group0 wikis (T299954)]], [[gerrit:879057{{!}}Stop writing to cul_user and cul_user_text on testwiki (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:22 gwicke: restarted cassandra on restbase1004 with jdk8
* 22:42 zabe@deploy1002: Started scap: Backport for [[gerrit:879055{{!}}Start reading from cuc_actor on group0 and group1 wikis (T233004)]], [[gerrit:879148{{!}}Start writing to rev_comment_id on group0 wikis (T299954)]], [[gerrit:879057{{!}}Stop writing to cul_user and cul_user_text on testwiki (T233004)]]
* 17:54 Jeff_Green: authdns-update for new rigel A record
* 22:40 effie: upload memkeys_20181031-2~bullseye0_ on bullseye-wikimedia
* 17:42 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: increase db2029 traffic to normal levels (duration: 00m 12s)
* 22:39 kindrobot: close UTC late backport window
* 17:37 gwicke: upgraded restbase1005 to jdk8
* {{safesubst:SAL entry|1=22:38 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:878154{{!}}Fix exception in `<gallery mode="slideshow">` with missing images]], [[gerrit:879100{{!}}Fix phan error when Excimer is enabled]], [[gerrit:879098{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399)]], [[gerrit:879099{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T30106}}
* 17:35 gwicke: restarting cassandra instance on restbase1005: out of heap
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
* 17:10 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade(2/2) (duration: 00m 11s)
* {{safesubst:SAL entry|1=22:21 kindrobot@deploy1002: kindrobot and matmarex: Backport for [[gerrit:878154{{!}}Fix exception in `<gallery mode="slideshow">` with missing images]], [[gerrit:879100{{!}}Fix phan error when Excimer is enabled]], [[gerrit:879098{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399)]], [[gerrit:879099{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view}}
* 17:09 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade (duration: 00m 11s)
* 22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
* 16:38 jynus: upgrade and restart of db2029
* 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
* 16:35 ori: depooled mw1152
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
* 15:29 logmsgbot: krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/222993/ (duration: 22m 09s)
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:21 _joe_: repooling mw1152
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:20 _joe_: attempting dump-apc on mw1060
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
* 15:09 _joe_: depooled the HHVM imagescaler again
* {{safesubst:SAL entry|1=21:58 kindrobot@deploy1002: Started scap: Backport for [[gerrit:878154{{!}}Fix exception in `<gallery mode="slideshow">` with missing images]], [[gerrit:879100{{!}}Fix phan error when Excimer is enabled]], [[gerrit:879098{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399)]], [[gerrit:879099{{!}}Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063}}
* 15:07 logmsgbot: krenair Started scap: https://gerrit.wikimedia.org/r/#/c/222993/
* 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
* 15:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222617/ (duration: 00m 12s)
* 21:34 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:879094{{!}}Fix mustache template rendering when TOC is rerendered after an edit (T326682)]], [[gerrit:879121{{!}}Enable page tools on beta cluster]] (duration: 10m 17s)
* 14:48 moritzm: installed python security updates on analytics*, lab* and virt*
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
* 14:46 moritzm: added python-diskimage-builder 0.1.46-1+wmf1 for jessie-wikimedia on carbon
* 21:25 kindrobot@deploy1002: kindrobot and jdrewniak and jdlrobson: Backport for [[gerrit:879094{{!}}Fix mustache template rendering when TOC is rerendered after an edit (T326682)]], [[gerrit:879121{{!}}Enable page tools on beta cluster]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:43 _joe_: depooled the HHVM imagescaler, spitting 503s again.
* 21:23 kindrobot@deploy1002: Started scap: Backport for [[gerrit:879094{{!}}Fix mustache template rendering when TOC is rerendered after an edit (T326682)]], [[gerrit:879121{{!}}Enable page tools on beta cluster]]
* 14:18 mobrovac: restbase started thinning out parsoid data (local_group_wikipedia_T_parsoid_dataDVIsgzJSne8k) for >= 22 days
* 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
* 14:07 YuviPanda: restart apache on labcontrol1001 to pick up parser function change
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
* 12:57 moritzm: installed python security updates on mw*, es* and db*
* 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 12:18 logmsgbot: hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s)
* 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 12:15 logmsgbot: hoo Finished scap: Update WikibaseQuality and WikibaseQualityConstraint (duration: 25m 56s)
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
* 11:49 logmsgbot: hoo Started scap: Update WikibaseQuality and WikibaseQualityConstraint
* 21:06 kindrobot: start UTC late backport window
* 11:40 hoo: Created the `wbqc_constraints` table on wikidatawiki
* 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
* 09:02 _joe_: restarted the appserver on mw1059 with hhvm.server.apc.expire_on_sets = true, restarted the heap profiling to confirm my hypothesis on T104769
* 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
* 08:31 _joe_: restarted cassandra on rb1004. again.
* 20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
* 05:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1034, depool db1041 (duration: 00m 12s)
* 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
* 05:00 springle: stash/pull/apply CommonSettings.php on tin, which was left with modifications
* 20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul  6 04:35:45 UTC 2015 (duration 35m 44s)
* 20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-06 02:22:12+00:00
* 20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 07s)
* 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
* 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
* 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
* 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
* 19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
* 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
* 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
* 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
* 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
* 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
* 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
* 18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad [[phab:T321391|T321391]]
* 18:52 brett: Removing legacy vips from dns servers - [[phab:T239993|T239993]]
* 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
* 18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad [[phab:T321391|T321391]]
* 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
* 18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
* 18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
* 18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
* 18:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
* 18:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:28 bblack: repool eqsin edge DC
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
* 18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 18:22 btullis@cumin1001: Added views for new wiki: blkwiki [[phab:T310872|T310872]]
* 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
* 18:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 18:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:07 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
* 18:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 17:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
* 17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
* 17:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
* 17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
* 17:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 17:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
* 17:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 17:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 17:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
* 17:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json
* 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:20 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
* 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 ([[phab:T321391|T321391]])', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
* 17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
* 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad [[phab:T321391|T321391]]
* 17:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad [[phab:T321391|T321391]]
* 16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad [[phab:T321391|T321391]]
* 16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad [[phab:T321391|T321391]]
* 16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad [[phab:T321391|T321391]]
* 16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
* 16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
* 16:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye
* 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:53 zabe@deploy1002: Finished scap: [[phab:T233004|T233004]] (duration: 07m 54s)
* 15:45 zabe@deploy1002: Started scap: [[phab:T233004|T233004]]
* 15:38 zabe@deploy1002: backport aborted:  (duration: 04m 25s)
* 15:38 zabe@deploy1002: sync-world aborted: Backport for [[gerrit:878870{{!}}Start reading from cul_actor everywhere (T233004)]] (duration: 04m 00s)
* 15:36 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:878870{{!}}Start reading from cul_actor everywhere (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:34 zabe@deploy1002: Started scap: Backport for [[gerrit:878870{{!}}Start reading from cul_actor everywhere (T233004)]]
* 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:21 marostegui: Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) [[phab:T326669|T326669]]
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P42982 and previous config saved to /var/cache/conftool/dbconfig/20230111-151712-marostegui.json
* 14:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:47 Lucas_WMDE: UTC afternoon backport+config window done
* 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
* 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 14:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/tests/jest/wikibase.vector.searchClient.spec.js: Backport: [[gerrit:877972{{!}}Add missing parentheses to vector search match text (T326633)]] (2/2) (duration: 06m 46s)
* 14:42 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/resources/wikibase.vector.searchClient.js: Backport: [[gerrit:877972{{!}}Add missing parentheses to vector search match text (T326633)]] (1/2) (duration: 07m 09s)
* 14:28 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:877983{{!}}Fix test constructing HTMLFormField without parent (T326621)]] (duration: 08m 38s)
* 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
* 14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
* 14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for [[gerrit:877983{{!}}Fix test constructing HTMLFormField without parent (T326621)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:877983{{!}}Fix test constructing HTMLFormField without parent (T326621)]]
* 14:14 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
* 14:10 moritzm: installing postgresql 11 security updates on maps/eqiad
* 14:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 14:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
* 13:55 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37002
* 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37002
* 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3302
* 13:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
* 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
* 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
* 13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
* 13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
* 13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
* 13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
* 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
* 13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
* 13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
* 13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
* 13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
* 13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
* 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
* 12:42 moritzm: installing postgresql 11 security updates on maps/codfw
* 12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
* 12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
* 12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
* 12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
* 12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
* 12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
* 12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
* 12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
* 12:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4004.wikimedia.org
* 12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
* 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
* 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 12:08 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
* 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
* 11:51 claime: repooled mw1486 in api_appserver eqiad after hardware investigation - [[phab:T326425|T326425]]
* 11:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
* 11:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
* 11:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
* 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3006.wikimedia.org
* 11:47 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1486.eqiad.wmnet
* 11:38 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=aux-k8s,service=kubesvc
* 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
* 11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
* 11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
* 11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
* 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
* 11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
* 11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
* 11:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
* 11:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3006.wikimedia.org
* 11:16 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
* 11:15 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 11:15 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.reboot-workers (exit_code=99) for Druid test cluster: Reboot Druid nodes
* 11:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
* 10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
* 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
* 10:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
* 10:31 zabe@deploy1002: Finished scap: Backport for [[gerrit:878160{{!}}Simplify expensive check (T326690)]], [[gerrit:877249{{!}}Start reading from cuc_actor on test wikis (T233004)]] (duration: 09m 34s)
* 10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
* 10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
* 10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
* 10:23 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:878160{{!}}Simplify expensive check (T326690)]], [[gerrit:877249{{!}}Start reading from cuc_actor on test wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 10:21 zabe@deploy1002: Started scap: Backport for [[gerrit:878160{{!}}Simplify expensive check (T326690)]], [[gerrit:877249{{!}}Start reading from cuc_actor on test wikis (T233004)]]
* 10:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
* 10:16 moritzm: installing postgresql-11 security updates
* 10:02 XioNoX: asw1-eqsin> request system reboot all-members - [[phab:T316532|T316532]]
* 09:49 moritzm: installing python3.7 security updates
* 08:31 kartik@deploy1002: Finished scap: Backport for [[gerrit:877223{{!}}CX: Fix transformation of TranslationUnitDTO to custom array (T326278)]] (duration: 11m 45s)
* 08:21 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:877223{{!}}CX: Fix transformation of TranslationUnitDTO to custom array (T326278)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:20 kartik@deploy1002: Started scap: Backport for [[gerrit:877223{{!}}CX: Fix transformation of TranslationUnitDTO to custom array (T326278)]]
* 05:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 05:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet


== July 5 ==
== 2023-01-10 ==
* 22:30 bd808: Restarted logstash on logstah1001; Hung due to OOM errors
* 23:58 krinkle@deploy1002: Finished deploy [integration/docroot@b7c82a3]: (no justification provided) (duration: 00m 15s)
* 22:03 mobrovac: restbase rolling restart of restbase
* 23:58 krinkle@deploy1002: Started deploy [integration/docroot@b7c82a3]: (no justification provided)
* 18:11 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/222932/ (duration: 00m 12s)
* 23:46 mutante: cumin2002 - sudo systemctl status httpbb_hourly_appserver
* 17:49 logmsgbot: krenair Synchronized docroot/noc/conf: https://gerrit.wikimedia.org/r/#/c/222290/ (duration: 00m 13s)
* 23:30 zabe@deploy1002: Finished scap: Backport for [[gerrit:878207{{!}}Start writing to rev_comment_id on test wikis (T299954)]] (duration: 09m 39s</