You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(legoktm@tin Synchronized php-1.26wmf24/extensions/Echo/Hooks.php: Remove duplicate 'MediaWiki' prefix from echo.unseen stats (duration: 00m 12s) (logmsgbot))
imported>Stashbot
(oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync)
 
Line 1: Line 1:
== 2015-09-23 ==
== 2023-05-28 ==
* 00:56 logmsgbot: legoktm@tin Synchronized php-1.26wmf24/extensions/Echo/Hooks.php: Remove duplicate 'MediaWiki' prefix from echo.unseen stats (duration: 00m 12s)
* 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 00:03 RoanKattouw: Running FlowFixLinks.php on testwiki
* 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading [[phab:T337446|T337446]]


== 2015-09-22 ==
== 2023-05-27 ==
* 23:40 mutante: renaming search mailing lists to discovery mailing lists
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]])
* 23:35 logmsgbot: krenair@tin Synchronized php-1.26wmf24/extensions/Echo: https://gerrit.wikimedia.org/r/#/c/240283/ and https://gerrit.wikimedia.org/r/#/c/240281/ (duration: 00m 13s)
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday
* 23:18 logmsgbot: krenair@tin Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/240278/ (duration: 00m 12s)
* 00:03 tzatziki: removing 1 file for legal compliance
* 23:16 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/240259/ (duration: 00m 12s)
* 23:15 logmsgbot: krenair@tin Synchronized w/static/images/sul/wikimania.png: https://gerrit.wikimedia.org/r/#/c/239308/ (duration: 00m 11s)
* 23:14 logmsgbot: krenair@tin Synchronized w/static/images/sul/commons.png: https://gerrit.wikimedia.org/r/#/c/239308/ (duration: 00m 12s)
* 22:44 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Set $wgFlowMigrateReferenceWiki to false in production (duration: 00m 12s)
* 22:38 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow opt-in on testwiki for testing (duration: 00m 12s)
* 21:56 cwdent: updated payments from 7b08867d9c5e87f5babb4b5b9cf1f5bec5e243b3 to 8428499feb8760d63faf681d53995697a2ba0fa7
* 21:49 chasemp: unban elastic1030 from T112559
* 21:33 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf24
* 21:13 logmsgbot: twentyafterfour@tin Finished scap: Test 1.26wmf24 (duration: 50m 34s)
* 20:44 mutante: cancel backup job of bast1001 on helium because running low on disk
* 20:22 logmsgbot: twentyafterfour@tin Started scap: Test 1.26wmf24
* 16:50 robh: all mw servers returned to puppet enabled, puppet swat window over
* 16:40 awight: updated paymentswiki 153418195a45cab820bc2aacf9a4f7dbc9dde768 to 7b08867d9c5e87f5babb4b5b9cf1f5bec5e243b3
* 16:23 robh: re-enabled puppet on mw hosts, as both redirection changes are good
* 16:14 robh: re-enabling puppet on mw hosts, as the new patchset 239278 deployed and tested fine on a single host, deploying to rest
* 16:04 robh: disabling puppet across mw hosts for new configuration deployment
* 15:46 godog: running puppet on restbase2001
* 15:41 godog: stop puppet on restbase2* pending codfw expansion
* 15:31 cwdent: updated payments from 153418195a45cab820bc2aacf9a4f7dbc9dde768 to 7b08867d9c5e87f5babb4b5b9cf1f5bec5e243b3
* 14:42 cmjohnson1: shutting down elastic1005 and elastic1030 to move around within the data center
* 14:19 bblack: starting slow restart of varnish + varnish-frontend daemon processes on global text, upload, and mobile clusters for shm_reclen (all randomly blended, no parallelism, ~5 minute spacing, will take ~9 hours - FEs will lose cache data, BEs will not)
* 14:14 chasemp: depool elastic nodes for T112559
* 11:18 logmsgbot: aude@tin Synchronized php-1.26wmf23/extensions/Wikidata: Fix autocomment and change handling bugs (duration: 00m 21s)
* 10:42 logmsgbot: aude@tin Synchronized arbitraryaccess.dblist: Enable arbitrary access for Wikibooks (duration: 00m 12s)
* 10:42 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Enable data access for Wikibooks - try again for snapshot hosts (duration: 00m 12s)
* 10:35 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: Enable data access for Wikibooks (duration: 01m 12s)
* 10:17 moritzm: enabled ferm on mw1152 (videoscaler)
* 10:03 godog: finished stressdisk on restbase200[123] no errors reported
* 10:03 moritzm: enabled ferm on mw1259 (videoscaler)
* 04:35 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep 22 04:35:46 UTC 2015 (duration 35m 45s)
* 02:22 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-22 02:22:56+00:00
* 02:19 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 00s)
* 01:53 mutante: sodium - deleted salt key, revoked puppet cert, rm from icinga ..
* 00:32 ori: Disabled Puppet for 24h on hafnium and stopped ganglia-monitor. gmond was saturating CPU.


== 2015-09-21 ==
== 2023-05-26 ==
* 23:29 logmsgbot: krinkle@tin Synchronized php-1.26wmf23/extensions/NavigationTiming/modules/ext.navigationTiming.js: T112593 (duration: 00m 14s)
* 23:48 tzatziki: removing 2 files for legal compliance
* 23:21 logmsgbot: krenair@tin Synchronized php-1.26wmf22/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/239828/ (duration: 00m 21s)
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:07 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/238357/ (duration: 00m 13s)
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:26 mutante: restarting gerrit for ssh config change
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 21:15 awight: legacy PayPal listener updated from 1c9ac2e66d11bbf768ea873d6e1a2522ca9841c1 to 55aeef63f6508381e3a8b7fcabddf9a3c3b73b8e
* 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 20:43 cwdent: updated worldpay config on payments
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:41 mdholloway: MobileApps deployed sha1 013044e
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:39 chasemp: banning elastic1005 for T112559
* 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 20:29 subbu: deployed parsoid version 9984d221
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 19:41 urandom: temporarily stopping codfw restbase cassandra nodes to test quorum auth
* 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 19:15 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: Ieccb23f: Enable async secondary writes for mysql-multiwrite cache (on testwiki) (duration: 00m 13s)
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 18:36 ejegg: re-enabled paypal audit parser
* 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 18:16 cmjohnson1: disabling puppet on mw1031
* 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 10s)
* 17:58 chasemp: banning 1030 from eqiad elastic cluster for T112559#1660068
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 17:57 ejegg: disabled paypal audit parser
* 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
* 16:08 ejegg: updated payments-wiki from 4d9d165c40070e036176dba8987243f6dbc7415e to 153418195a45cab820bc2aacf9a4f7dbc9dde768
* 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 15:22 logmsgbot: thcipriani@tin Synchronized php-1.26wmf23/extensions/ContentTranslation/modules/entrypoint/ext.cx.interlanguagelink.js: SWAT: Revert "Do not call cxserver to display gray interwiki link" [[gerrit:239819]] (duration: 00m 11s)
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:10 logmsgbot: thcipriani@tin Synchronized robots.txt: SWAT: Remove redundant entries from robots.txt [[gerrit:239403]] (duration: 00m 12s)
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 14:33 ottomata: restart eventlogging with mysql consumer replace=True (AKA INSERT IGNORE)
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 14:09 godog: rolling restart restbase in production after cassandra credentials change
* 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 12:53 godog: rolling restart cassandra after enabling dc encryption, no nodes in codfw yet
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 12:01 moritzm: repooled mw1160 (for T104969)
* 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:54 moritzm: depooled mw1160 (for T104969)
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 11:51 moritzm: repooled mw1158, mw1159 (for T104969)
* 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:39 moritzm: depooled mw1158, mw1159 (for T104969)
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:37 moritzm: depooled and repooled mw1156, mw1157 (for T104969)
* 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:26 moritzm: repooled mw1154, mw1155 (for T104969)
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:21 moritzm: depooled mw1154, mw1155 (for T104969)
* 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 10:39 moritzm: repooled mw1026-mw1029 and mw1110-mw1113 (for T104968)
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:24 moritzm: depooled mw1026-mw1029 and mw1110-mw1113 (for T104968)
* 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 10:17 moritzm: repooled mw1100-mw1109 (for T104968)
* 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
* 10:17 godog: create restbase user on cassandra cluster
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
* 10:06 moritzm: depooled mw1100-mw1109 (for T104968)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 09:56 moritzm: repooled mw1140 and mw1142-mw1148 (for T104968)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 09:41 moritzm: depooled mw1140 and mw1142-mw1148 (for T104968)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
* 09:36 moritzm: repooled mw1130-mw1139 (for T104968)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 09:22 moritzm: depooled mw1130-mw1139 (for T104968)
* 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 09:14 moritzm: repooled mw1120-mw1129 (for T104968)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
* 09:02 moritzm: depooled mw1120-mw1129 (for T104968)
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
* 08:48 moritzm: repooled mw1189 and mw1200-mw1208 (for T104968)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:33 moritzm: depooled mw1189 and mw1200-mw1208 (for T104968)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 08:29 godog: switch to 'restbase' cassandra user on restbase test cluster
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 08:29 moritzm: repooled mw1190-mw1195 and mw1197-mw1199 (for T104968)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
* 08:21 _joe_: restarted the logstash agent on logstash1003, OOM'd
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 08:18 moritzm: depooled mw1190-mw1195 and mw1197-mw1199 (for T104968)
* 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 08:07 _joe_: installing the new HHVM package on the api canaries
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 08:04 moritzm: repooled mw1221-mw1229 (for T104968)
* 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
* 07:53 moritzm: depooled mw1221-mw1229 (for T104968)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
* 07:49 moritzm: repooled mw1230-mw1235 (for T104968)
* 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
* 07:43 _joe_: installing the new hhvm package on the canary appservers
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:08 moritzm: depooled mw1230-mw1235 (for T104968)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 04:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Sep 21 04:31:03 UTC 2015 (duration 31m 2s)
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 02:23 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-21 02:23:12+00:00
* 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 25s)
* 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
* 02:06 MaxSem: Maps: created indexes on admin. <3 Postgres :(
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 01:56 bblack: downtimed eqiad ipv6 text/upload alerts as well, as with mobile above ( 1 301 TLS Redirect - 505 bytes in 1.008 second response time
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 01:46 bblack: downtimed the "LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6" alert for now ( https://phabricator.wikimedia.org/T113154 )
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]]
* 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
* 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]]
* 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster  - [[phab:T329366|T329366]]
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
* 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]]
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
* 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
* 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
* 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
* 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)


== 2015-09-20 ==
== 2023-05-25 ==
* 22:34 yuvipanda: reloda pybal on lvs1012
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s)
* 17:01 bblack: repooling cp1046 varnish-be + varnish-be-rand in confctl, fresh storage, purge queue caught up - T113184
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:44 bblack: depooling cp1046 varnish-be + varnish-be-rand in confctl, wiping storage, re-pooling - T113184
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]]
* 07:24 paravoid: temporarily disabling puppet on fermium and applying antispam countermeasures
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
* 04:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep 20 04:29:16 UTC 2015 (duration 29m 15s)
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
* 02:22 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-20 02:22:53+00:00
* 20:47 TheresNoTime: close UTC late backport
* 02:19 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 12s)
* 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s)
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]]
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s)
* 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]]
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s)
* 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]]
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
* 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
* 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
* 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
* 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
* 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]])
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
* 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
* 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
* 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
* 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s)
* 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
* 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s)
* 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]]
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s)
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s)
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]]
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s)
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:15 jbond: update udplog on mwlog server
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
* 11:09 jbond: upload udplog_1.10_amd64.deb
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]])
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000  to dumpsdata1006, copying all public xml dumps data
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]]
* 07:52 matthiasmullie: UTC morning backports done
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s)
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]]
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}


== 2015-09-19 ==
== 2023-05-24 ==
* 23:12 urandom: begining Cassandra repair on restbase1005 (nodetool repair -pr)
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s)
* 23:08 urandom: begining Cassandra repair on restbase1004 (nodetool repair -pr)
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:56 jynus: restarting once more giblit, last chance
* 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]]
* 19:04 paravoid: salt rm /etc/systemd/system/txstatsd.service from all cp*, leftover because of ::txstatsd::decommission (removed with 4a1d4e) missing it
* 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s)
* 19:00 ejegg|away: updated SmashPig from d5895428d1d8ebc5a6e172e8cdec6dbec0b10d85 to d1baa32267eaad7d69b47c657f4853eb306fad6b
* 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 18:45 _joe_: restarted gitblit. I will now substitute myself with a clever perl one-liner.
* 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 18:38 paravoid: pooling back cp1046 to pybal eqiad/mobile, has stayed stable
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s)
* 18:34 paravoid: reactivating ΒGP with GTT @ eqiad
* 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:42 _joe_: cp1046 dead on console again, powercycling to inspect it
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 05:49 logmsgbot: aaron@tin Synchronized php-1.26wmf23/extensions/TitleBlacklist: 80d3a21a51f9c54ed2d94 (duration: 00m 12s)
* 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 05:22 paravoid: pybal-depooling cp1046 from eqiad/mobile until further investigation
* 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 05:21 paravoid: powercycling cp1046, dead on console
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 05:01 awight: deploy SmashPig config to limit weekend spam
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 04:40 awight: update crm from 15ea14f61338ca9f34e9ccb9f56eae14a161380a to 9fa38d06a75363a8009bce7ced190e39c75b68bc
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 04:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Sep 19 04:28:59 UTC 2015 (duration 28m 57s)
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:23 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-19 02:23:29+00:00
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 05s)
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]]
* 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}}
* 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s)
* 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance
* 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]]
* 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s)
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]]
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]]
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}}
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}}
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:18 aqu: analytics-refinery, about to deploy
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s)
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s)
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]]
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]]
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s)
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]]
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s)
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]]
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s)
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]]
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s)
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]]
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s)
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]]
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]]
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]]
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]]
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s)
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]]
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
* 01:19 mutante: contint2001 - jenkins started again
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:45 mutante: short maintenance on main contint server (jenkins)
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance


== 2015-09-18 ==
== 2023-05-23 ==
* 23:22 awight: update fundraising-tools from 3e0e3ae799a507b378d0ece3e71631b10b361329 to e1b60fa2c258fd4ff55905b03a4d8886132278c1
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
* 20:52 ebernhardson: restart es on elastic1025 to disable dynamic scripting
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 20:34 gwicke: dropped by_ns indexes on restbase title_revisions tables
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 19:54 gwicke: finished deploy of restbase daacf4daa
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
* 19:45 gwicke: re-enabled puppet on restbase100*
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]]
* 19:35 gwicke: canary deploy of restbase daacf4daa on restbase1001; moving forward so that we can re-enable puppet over the weekend.
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}}
* 18:38 cwdent: updated payments from 1bdd287b083032ff418434ad6bb6920735af918a to 4d9d165c40070e036176dba8987243f6dbc7415e
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}}
* 17:54 logmsgbot: ebernhardson@tin Synchronized wmf-config/CommonSettings.php: Replace insecure es usage with usage of a plugin (duration: 00m 12s)
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:41 mutante: mailman now on 2.1.18 and jessie
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:14 dcausse: elastic in eqiad plugin updates: restarting elastic1021
* 21:02 TheresNoTime: close UTC late backport window
* 16:07 paravoid: deactivating ΒGP with GTT @ eqiad
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s)
* 15:20 godog: create restbase user on cassandra test cluster
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:55 dcausse: elastic in eqiad plugin updates: restarting elastic1020
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:22 bblack: committing lvs1007-1012 port/vlan changes for asw-d-eqiad (but leaving all 6 LVS ports in "disabled" state - T112781 )
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:14 bblack: committing lvs1007-12 port/vlan changes for asw-b-eqiad, round 3...
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 mutante: sodium - stopped exim - rsyncing lists to fermium
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:10 dcausse: elastic in eqiad plugin updates: restarting elastic1019
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:07 mutante: stopped mailman on sodium
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:01 bblack: rollback on asw-b-eqiad changes above
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]]
* 13:56 bblack: committing eqiad lvs1007-1012 port/vlan changes for asw-b-eqiad
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s)
* 13:20 bblack: committing eqiad lvs1007-12 port/vlan changes for asw-c-eqiad
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:16 bblack: commiting eqiad lvs1007-12 port/vlan changes for asw2-a5-eqiad
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}}
* 13:12 dcausse: elastic in eqiad plugin updates: restarting elastic1018
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]]
* 12:21 godog: restart logstash on logstash1001, OOM in logs
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:55 dcausse: elastic in eqiad plugin updates: restarting elastic1017
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:06 dcausse: elastic in eqiad plugin updates: restarting elastic1016
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:28 moritzm: restarted salt-master on palladium
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:46 moritzm: installed openldap security updates on plutonium
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:37 moritzm: installed openldap security updates on pollux
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:33 dcausse: elastic in eqiad plugin updates: restarting elastic1015
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:22 dcausse: elastic in eqiad plugin updates: restarting elastic1014
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:21 dcausse: elastic in eqiad plugin updates: restarting elastic1013
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 06:15 dcausse: elastic in eqiad plugin updates: restarting elastic1012
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 04:37 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep 18 04:37:42 UTC 2015 (duration 37m 41s)
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:31 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-18 02:31:49+00:00
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:28 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 08s)
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:21 logmsgbot: krenair@tin Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/218353/ (duration: 00m 12s)
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:21 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218353/ (duration: 00m 11s)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:13 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237149/ (duration: 00m 12s)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:07 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234544/ (duration: 00m 12s)
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:58 logmsgbot: ori@tin Synchronized php-1.26wmf22/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: ResourceLoaderModule: cache file content hash (duration: 00m 12s)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 01:58 logmsgbot: ori@tin Synchronized php-1.26wmf23/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: ResourceLoaderModule: cache file content hash (duration: 00m 11s)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 01:57 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://phabricator.wikimedia.org/T106264 (duration: 00m 12s)
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:36 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237331/ (duration: 00m 12s)
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:14 ori: restarted logstash on logstash1001
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]]
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]]
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]]
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s)
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s)
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]]
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]]
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]]
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s)
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]]
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s)
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]]
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s)
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]]
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s)
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s)
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]])
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 49m 04s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}}
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}}


== 2015-09-17 ==
== 2023-05-22 ==
* 23:39 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/238978/ (duration: 00m 12s)
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}}
* 23:05 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta-only change (duration: 00m 12s)
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s)
* 23:04 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta-only change (duration: 00m 12s)
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 22:53 gwicke: puppet on restbase cluster disabled since about  21:30 UTC for gradual deploy; ran into minor issue in staging, which is now being addressed, after which deploy will continue
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]]
* 21:22 logmsgbot: ori@tin Synchronized php-1.26wmf22/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: Use MD4 to compute file hash rather than SHA1 (duration: 00m 13s)
* 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]]
* 21:22 logmsgbot: ori@tin Synchronized php-1.26wmf23/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: Use MD4 to compute file hash rather than SHA1 (duration: 00m 12s)
* 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
* 20:44 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:22 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/239206/ (duration: 00m 12s)
* 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 19:46 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: (no message) (duration: 00m 12s)
* 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 19:41 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/239181/ (duration: 00m 14s)
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 19:12 mutante: powercycling unresponse mw1005
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
* 18:14 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf23
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
* 17:38 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/includes/registration/ExtensionRegistry.php: registration: Fix merging of array_plus (duration: 00m 13s)
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:35 logmsgbot: legoktm@tin Synchronized php-1.26wmf23/includes/registration/ExtensionRegistry.php: registration: Fix merging of array_plus (duration: 00m 11s)
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:43 chasemp: restart elasticsearch on 1005
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:16 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235900/ (duration: 00m 12s)
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 15:15 dcausse: elastic in eqiad plugin updates: restarting elastic1004 (take 2)
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
* 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable Suggestions in ptwiki [[gerrit:238097]] (duration: 00m 13s)
* 20:27 TheresNoTime: close UTC late backport window
* 14:22 mutante: analytics1029 -  Failed to start Hadoop datanode
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s)
* 14:20 mutante: starting hadoop datanode on analytics1029
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:14 _joe_: reimaging tmh1001 to mw1259
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]]
* 14:11 jynus: stopping replication and applying schema change to db1051
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s)
* 14:05 dcausse: elastic in eqiad plugin updates: can't restart elastic1004 (2 timeouts when disabling replication, too much load?), waiting for more shards to rebalance...
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 13:58 dcausse: elastic in eqiad plugin updates: restarting elastic1004
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 13:50 moritzm: repooled mw1236-mw1239 (T104968)
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:34 moritzm: depooled mw1236-mw1239 (T104968)
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]]
* 13:26 moritzm: repooled mw1090-mw1099 (T104968)
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:16 moritzm: depooled mw1090-mw1099 (T104968)
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:13 moritzm: repooled mw1080-mw1089 (T104968)
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:05 moritzm: depooled mw1080-mw1089 (T104968)
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:01 moritzm: repooled mw1070-mw1079 (T104968)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:49 moritzm: depooled mw1070-mw1079 (T104968)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:35 moritzm: repooled mw1060 and mw1062-mw1069 (T104968)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 moritzm: depooled mw1060 and mw1062-mw1069 (T104968) (not repooled)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 moritzm: repooled mw1060 and mw1062-mw1069 (T104968)
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
* 12:16 moritzm: repooled mw1050-mw1059
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
* 12:04 moritzm: depooled mw1050-mw1059
* 16:58 XioNoX: push mgmt_junos to all L2 switches
* 11:39 moritzm: repooled mw1040 and mw1042-mw1049 (T104968)
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
* 11:36 dcausse: elastic in eqiad plugin updates: restarting elastic1003
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
* 11:26 moritzm: typoed earlier entry: "mw1032-mw1039" instead of "mw1032-mw1239"
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
* 11:26 moritzm: depooled mw1040 and mw1042-mw1049 (T104968)
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
* 11:18 moritzm: repooled mw1030 and mw1032-mw1239 (T104968)
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 11:03 moritzm: depooled mw1030 and mw1032-mw1239 (T104968)
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 10:35 moritzm: repooled mw1250-mw1258 (T104968)
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 10:27 moritzm: depooled mw1250-mw1258 (T104968)
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 10:25 _joe_: killing temporarily subra
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 10:24 moritzm: repooled mw1240-mw1249 (T104968)
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 10:19 _joe_: experimenting with poolcounter issues on subra
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:18 logmsgbot: oblivian@tin Synchronized wmf-config/PoolCounterSettings-codfw.php: Use codfw poolcounters in codfw (duration: 00m 12s)
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 10:12 moritzm: depooled mw1240-mw1249 (T104968)
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:12 dcausse: elastic in eqiad plugin updates: restarting elastic1002
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 10:05 logmsgbot: hoo@tin Synchronized wmf-config/: Set 'repoConceptBaseUri' for all Wikibase clients (duration: 00m 13s)
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
* 10:00 dcausse: elastic in eqiad plugin updates: unfreezing indices
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 09:48 dcausse: elastic in eqiad plugin updates: no more groovy in warmers, waiting for few more shards to move in elastic1001 and will unfreeze indices to test warmers
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 09:39 dcausse: elastic in eqiad plugin updates: deleting warmers manually for old unused indices (eswikisource_content_1415240352, ruwiki_content_1415302164, thwiki_content_1415318677). We will have to remove these indices.
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
* 09:39 paravoid: repooling ulsfo US-West traffic back to ulsfo for the first time since May :)
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
* 09:01 dcausse: elastic in eqiad plugin updates: updating warmers on all wikis
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 08:58 paravoid: penalizing ulsfo-eqiad direct MPLS links to higher OSPF weights
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
* 08:57 paravoid: adjusting OSPF weights to be latency-based across the US network
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 08:53 _joe_: removed iptables rules for dropping traffic to helium on mw1017
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
* 08:52 dcausse: elastic in eqiad plugin updates: index warmer queries are outdated with inline groovy script, updating warmers on warwiki first to test
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]]
* 08:05 paravoid: eqiad-codfw -> eqiad-eqord-codfw migration
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
* 07:49 moritzm: repooled mw1180-mw1188 (T104968)
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s)
* 07:42 dcausse: elastic in eqiad plugin updates: restarting elastic1001
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
* 07:42 moritzm: depooled mw1180-mw1188 (T104968)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 07:37 moritzm: repooled mw1170-mw1179 (T104968)
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 07:36 dcausse: elastic in eqiad plugin updates: freezing indices
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
* 07:27 moritzm: depooled mw1170-mw1179 (T104968)
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
* 07:14 _joe_: uploading new HHVM package
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:07 moritzm: repooled mw1161-1168 (T104968)
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 06:57 moritzm: depooled mw1161-1168 (T104968)
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 06:45 moritzm: repooled mw1209-mw1220 with ferm enabled
* 10:02 moritzm: installing updated usb.ids packages for Bullseye
* 06:33 moritzm: depooling mw1209-mw1220 (in two steps)
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 05:47 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep 17 05:47:47 UTC 2015 (duration 47m 46s)
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
* 03:06 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-17 03:06:33+00:00
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
* 03:03 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 30s)
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 02:45 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-17 02:45:48+00:00
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 02:39 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 11m 11s)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
* 00:35 cwdent: updated payments from 155cdeb737c01baf62551292764fd2f5a93a9a63 to 1bdd287b083032ff418434ad6bb6920735af918a
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]]
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
* 08:22 moritzm: installing systemd security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json


== 2015-09-16 ==
== 2023-05-21 ==
* 23:27 bblack: updating eqiad switch configs for lvs1007-1012 vlan/trunk settings
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 23:19 logmsgbot: krenair@tin Synchronized php-1.26wmf23/extensions/MobileFrontend/resources/mobile.overlays/Overlay.less: https://gerrit.wikimedia.org/r/#/c/238865/ (duration: 00m 11s)
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 23:13 gwicke: started `nodetool rebuild -- eqiad` on restbase-test200{1,2
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 23:03 cwdent: updated payments from 9fc8ab40b7f70c7b588c2b9e7b5c94b1f893faa1 to 155cdeb737c01baf62551292764fd2f5a93a9a63
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 22:26 ejegg: updated SmashPig from fdb053efa617162ac9f695e493c390987a069140 to d5895428d1d8ebc5a6e172e8cdec6dbec0b10d85
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 22:08 urandom: disabling puppet in RESTBase eqiad staging cluster to test new code and config
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 22:08 ottomata: powercycling  analytcis1029, it is down?
* 20:47 cscott: updated OCG to version 4032a596ce6eb442b02cc6ee9b79263b1eb23275
* 19:42 ejegg: updated crm from abc34b87ee9d1dbb1176f1929a3d748e1ee5ac7b to 15ea14f61338ca9f34e9ccb9f56eae14a161380a
* 19:38 ori: Deployed statsv 0bfd9f06f / change I050a12d3b
* 18:47 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf23
* 18:38 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf23: syncing wmf23 ahead of deployment to group1 (duration: 01m 35s)
* 17:34 paravoid: asw-d-eqiad: toggling RE mastership again
* 17:26 godog: stop puppet on restbase* to apply https://gerrit.wikimedia.org/r/#/c/238738/ / merge / reenable puppet
* 16:54 _joe_: turned on the hhvm tmh, stopping the zend ones for testing
* 16:44 logmsgbot: oblivian@tin Synchronized wmf-config/CommonSettings.php: use ffmpeg whereever possible (duration: 00m 12s)
* 16:16 bblack: upgrading pybal on lvs400[12]
* 16:12 bblack: upgrading pybal on lvs400[34], lvs300[34]
* 16:08 bblack: upgrading pybal on lvs200[123]
* 16:05 bblack: upgrading pybal on lvs200[456]
* 15:44 _joe_: uploading pybal 1.10 to reprepro, installing to the test cluster
* 15:24 moritzm: uploaded debdeploy 0.0.6 to apt.wikimedia.org
* 15:10 hashar: Started using Nodepool spawned instances.  Moved integration-jjb-config-diff Jenkins job to Nodepool with https://gerrit.wikimedia.org/r/#/c/238752/  . See also: https://phabricator.wikimedia.org/T112750
* 15:05 logmsgbot: demon@tin Synchronized wmf-config/CommonSettings.php: Add m.wikidata.org to wgCrossSiteAJAXdomains (duration: 00m 12s)
* 14:51 _joe_: experimenting on testwiki for poolcounter failure scenarios
* 14:45 moritzm: enabled ferm on mw1010 (jobrunner) in eqiad
* 14:27 paravoid: asw-d-eqiad: toggling RE mastership
* 14:18 paravoid: disabling/ignoring asw-d-eqiad @ librenms
* 14:09 jynus: upgrading and restarting db1051
* 13:57 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1051 for maintenance (duration: 00m 12s)
* 13:40 urandom: initiating Cassandra repair on restbase1007 (nodetool repair -pr)
* 13:40 logmsgbot: catrope@tin Synchronized php-1.26wmf23: (no message) (duration: 01m 37s)
* 13:35 moritzm: repooled mw1149-mw1151 (with ferm enabled)
* 13:24 moritzm: depooled mw1149-mw1151 (for enabling ferm)
* 13:19 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Reverting depool of es1055 (duration: 00m 12s)
* 13:15 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1055 for maintenance (duration: 00m 12s)
* 13:03 paravoid: disabling asw-d-eqiad xe-8/0/23, xe-8/0/24, xe-8/0/25, xe-8/0/26, xe-8/0/27, xe-8/0/28; servers reboot-looping -> asw-d's SNMP unhappy -> librenms unhappy -> faidon's mailbox unhappy
* 12:48 moritzm: repooled mw1115-mw1117, mw1119 (with ferm enabled)
* 12:42 moritzm: depooling mw1115-mw1117, mw1119 (mw1118 was already depooled) to enable ferm
* 11:32 moritzm: repooled mw1019-mw1025 with ferm enabled
* 11:24 jynus: making db1069 a sibling of db1055 (s1)
* 11:13 godog: create restbase user on cassandra test cluster
* 11:07 moritzm: depooled mw1019-mw1025 (to enable ferm)
* 10:52 logmsgbot: catrope@tin Synchronized php-1.26wmf23: (no message) (duration: 02m 04s)
* 10:49 logmsgbot: catrope@tin Synchronized php-1.26wmf22: (no message) (duration: 02m 12s)
* 10:48 jynus: reenabling semisync on db1072 and db1073
* 10:47 logmsgbot: catrope@tin scap aborted: (no message) (duration: 00m 21s)
* 10:47 logmsgbot: catrope@tin Started scap: (no message)
* 10:24 logmsgbot: catrope@tin Synchronized php-1.26wmf23/includes/changes/EnhancedChangesList.php: T112738 (duration: 00m 12s)
* 10:09 logmsgbot: aude@tin Synchronized arbitraryaccess.dblist: (no message) (duration: 00m 11s)
* 09:37 awight: ruthlessly disabled PayPal IPN listener
* 08:12 moritzm: repooled mw1153 with ferm enabled
* 07:57 jynus: truncated some tables from ContentTranslation extension on x1
* 07:57 moritzm: depooled mw1153 (it's an image scaler, of course) to enable ferm
* 07:56 moritzm: depooled mw1153 (videoscaler) to enable ferm
* 06:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Sep 16 06:31:58 UTC 2015 (duration 31m 57s)
* 03:28 logmsgbot: ori@tin Synchronized php-1.26wmf22/vendor/monolog/monolog/src/Monolog/Logger.php: Iccfda47689: monolog: Don't waste milliseconds counting microseconds (duration: 00m 12s)
* 03:27 logmsgbot: ori@tin Synchronized php-1.26wmf23/vendor/monolog/monolog/src/Monolog/Logger.php: Iccfda47689: monolog: Dont waste milliseconds counting microseconds ; sync-file php-1.26wmf22/vendor/monolog/monolog/src/Monolog/Logger.php Iccfda47689: monolog: Dont waste milliseconds counting microseconds (duration: 00m 12s)
* 03:12 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-16 03:12:08+00:00
* 03:05 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 10m 30s)
* 02:38 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-16 02:38:48+00:00
* 02:35 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 07m 02s)
* 01:03 logmsgbot: krinkle@tin Synchronized php-1.26wmf23/resources/src/mediawiki/mediawiki.js: hotfix Ia2fcd13f4 (duration: 00m 12s)
* 00:29 logmsgbot: krinkle@tin Synchronized php-1.26wmf22/resources/src/mediawiki/mediawiki.js: hotfix Ia2fcd13f4 (duration: 00m 11s)
* 00:15 logmsgbot: legoktm@tin Synchronized php-1.26wmf23/extensions/CentralAuth/includes/: Use set() for tokens with unique keys (duration: 00m 12s)
* 00:14 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/CentralAuth/includes/: Use set() for tokens with unique keys (duration: 00m 12s)
* 00:11 bblack: reinstalling lvs400[12] to jessie (traffic on 400[34], already jessie)
* 00:08 logmsgbot: krenair@tin Synchronized php-1.26wmf23/extensions/VisualEditor/modules/ve-mw/ui/styles/dialogs: https://gerrit.wikimedia.org/r/#/c/238646/ (duration: 00m 12s)


== 2015-09-15 ==
== 2023-05-20 ==
* 23:51 logmsgbot: krenair@tin Synchronized php-1.26wmf22/extensions/WikimediaEvents/modules/ext.wikimediaEvents.geoFeatures.js: https://gerrit.wikimedia.org/r/#/c/238617/ (duration: 00m 12s)
* 18:25 effie: restart varnish cp3061
* 23:48 logmsgbot: krenair@tin Synchronized php-1.26wmf23/extensions/WikimediaEvents/modules/ext.wikimediaEvents.geoFeatures.js: https://gerrit.wikimedia.org/r/#/c/238618/ (duration: 00m 12s)
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
* 23:42 logmsgbot: krenair@tin Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/#/c/238543/ (duration: 00m 14s)
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s)
* 23:42 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/238543/ (duration: 00m 12s)
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:40 logmsgbot: krenair@tin Synchronized php-1.26wmf22/includes/resourceloader/ResourceLoader.php: https://gerrit.wikimedia.org/r/#/c/238544/ (duration: 00m 11s)
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]]
* 23:38 logmsgbot: krenair@tin Synchronized php-1.26wmf23/includes/resourceloader/ResourceLoader.php: https://gerrit.wikimedia.org/r/#/c/238545/ (duration: 00m 11s)
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
* 23:24 yurik: deployed kartotherian & tilerator
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:22 logmsgbot: krenair@tin Synchronized php-1.26wmf22/extensions/EventLogging/modules/ext.eventLogging.core.js: https://gerrit.wikimedia.org/r/#/c/238512/ (duration: 00m 12s)
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 23:16 logmsgbot: krenair@tin Synchronized php-1.26wmf23/extensions/EventLogging/modules/ext.eventLogging.core.js: https://gerrit.wikimedia.org/r/#/c/238513/ (duration: 00m 12s)
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 21:15 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf22/extensions/WikimediaEvents/: touch files edited in I0cb6fe37e and re-sync to cluster (duration: 00m 13s)
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:13 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf23
* 21:10 logmsgbot: twentyafterfour@tin Finished scap: sync 1.26wmf23 to testwiki, once more because mw1010 overloaded (duration: 03m 52s)
* 21:07 logmsgbot: twentyafterfour@tin Started scap: sync 1.26wmf23 to testwiki, once more because mw1010 overloaded
* 21:05 logmsgbot: twentyafterfour@tin Finished scap: sync 1.26wmf23 to testwiki, again (duration: 47m 49s)
* 20:47 mutante: mw1010 - extremely slow,finally got on and attempted to restart hhvm. load going down
* 20:17 logmsgbot: twentyafterfour@tin Started scap: sync 1.26wmf23 to testwiki, again
* 20:17 logmsgbot: twentyafterfour@tin scap aborted: sync 1.26wmf23 to testwiki (duration: 82m 58s)
* 20:05 ottomata: restarted mysql (and oozie) on analytics1027 to start mysql binlogging
* 18:54 logmsgbot: twentyafterfour@tin Started scap: sync 1.26wmf23 to testwiki
* 16:55 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1003, es1004, es1007 and es1010 for decommision (duration: 00m 12s)
* 16:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Revert depool db1055 for maintenance (duration: 00m 11s)
* 16:39 ottomata: reinstalling analytics1015
* 16:32 RoanKattouw: Putting wmf22 versions of Echo and MobileFrontend on mw1017 for testing
* 16:30 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf22/extensions/WikimediaEvents/WikimediaEvents.php: touch file that is serving old version in prod (duration: 00m 12s)
* 16:29 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf22/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSuggest.js: Touch file that is serving old version in prod (duration: 00m 12s)
* 16:27 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1055 for maintenance (duration: 00m 11s)
* 16:11 bblack: traffic DNS depooled out of codfw for now T112639
* 15:38 logmsgbot: thcipriani@tin Synchronized wmf-config: SWAT: CX: Enable suggestion for testwiki (part 2) [[gerrit:237327]] (duration: 00m 13s)
* 15:37 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable suggestion for testwiki (part 1) [[gerrit:237327]] (duration: 00m 12s)
* 15:31 logmsgbot: thcipriani@tin Synchronized php-1.26wmf22/extensions/UploadWizard/resources/jquery/jquery.mwCoolCats.js: SWAT: Do not fail horribly when invalid categories are passed [[gerrit:238421]] (duration: 00m 12s)
* 15:14 logmsgbot: thcipriani@tin Synchronized wmf-config/PoolCounterSettings-eqiad.php: SWAT: poolcounter: enable connect_timeout for testwiki [[gerrit:238109]] (duration: 00m 19s)
* 15:09 logmsgbot: thcipriani@tin Synchronized wmf-config/PoolCounterSettings-codfw.php: SWAT: poolcounter: add connect_timeout in codfw [[gerrit:238108]] (duration: 00m 12s)
* 15:06 logmsgbot: thcipriani@tin Synchronized wmf-config/Wikibase.php: SWAT: Exclude Flow topic boards and Draft NS from Special:UnconnectedPages [[gerrit:229197]] (duration: 00m 11s)
* 14:51 godog: bounce cassandra on test cluster to deploy  https://gerrit.wikimedia.org/r/236391
* 14:22 cmjohnson1: swapped disk on db1043
* 13:12 moritzm: repool mw1114 (with ferm enabled)
* 13:11 bblack: failing over LVS service in ulsfo to secondariess (400[12] pybal stopped, traffic on jessie-based 400[34])
* 12:53 moritzm: depooled mw1114 (for enabling ferm)
* 11:42 moritzm: repool mw1018 (with ferm enabled)
* 11:23 moritzm: depooled mw1018 (for enabling ferm)
* 08:53 _joe_: created a 100 G partition on a LV on copper, for /tmp
* 08:24 godog: bounce ms-be2006, xfs
* 08:22 moritzm: bumped default size of iptables connection tracking table to 256k
* 06:10 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep 15 06:10:52 UTC 2015 (duration 10m 51s)
* 02:46 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-15 02:46:50+00:00
* 02:40 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 10m 53s)
* 02:18 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/MobileFrontend: Revert Echo to 1.26wmf21 state (duration: 00m 11s)
* 02:18 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo: Revert Echo to 1.26wmf21 state (duration: 00m 12s)
* 01:30 logmsgbot: krinkle@tin Synchronized php-1.26wmf22/resources/src: T112287 (duration: 00m 11s)
* 00:49 bblack: reinstalling lvs300[34] to jessie
* 00:43 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-labs.php: noop sync of labs config change (duration: 00m 11s)
* 00:03 logmsgbot: tstarling@tin Synchronized php-1.26wmf22/extensions/ParsoidBatchAPI: for I56d28e9a for RT testing, not live yet (duration: 00m 13s)


== 2015-09-14 ==
== 2023-05-19 ==
* 23:27 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf22/extensions/WikimediaEvents/: Change bucket selection methods in CompletionSuggestions AB test (duration: 00m 12s)
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:23 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf22/extensions/UploadWizard/: Swat out badtoken fix to UploadWizard in 1.26wmf22 (duration: 00m 12s)
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 22:37 yurik: deployed tilerator
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 21:15 logmsgbot: ori@tin Synchronized php-1.26wmf22/extensions/TitleBlacklist: Ie44fcb500: Avoid checking blacklists in isBlacklisted() for existing titles (duration: 00m 12s)
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 21:15 mutante: labnodepool1001 - re-enable puppet and nodepool
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
* 20:59 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/: Hack around OOUI's icon pack being too large by creating our own (duration: 00m 12s)
* 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
* 20:53 cscott: updated OCG to version 5811056e28f2bc6408b6da96095352ab381bb11f
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
* 20:21 andrewbogott: graceful’d apache2 on labcontrol1001
* 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
* 20:15 subbu: deployed parsoid sha 3d5f4359
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
* 19:25 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/: Only load nojs Special:Notifications styles on the special page (duration: 00m 12s)
* 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
* 18:05 urandom: rebuilding restbase-test2001.codfw (nodetool rebuild -- eqiad)
* 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
* 16:12 logmsgbot: catrope@tin Synchronized php-1.26wmf22/extensions/Echo/: For real this time (duration: 00m 11s)
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]]
* 16:06 ottomata: stopping hdfs journalnode on analytics1011 to copy journal edits to new journalnodes on analytics1035 and analytics1052
* 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s)
* 15:46 godog: switch to openjdk-8 and bounce cassandra on restbase-test200*
* 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:39 bblack: reinstalling lvs4003, lvs4004 (jessie upgrade: T96375) (typo earlier)
* 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]]
* 15:39 bblack: reinstalling lvs4003, lvs4003 (jessie upgrade: T96375)
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s)
* 15:34 logmsgbot: catrope@tin Synchronized php-1.26wmf22/extensions/Echo/: SWAT (duration: 00m 13s)
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:05 logmsgbot: krenair@tin Synchronized .gitignore: https://gerrit.wikimedia.org/r/#/c/237529/ (duration: 00m 13s)
* 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
* 15:05 logmsgbot: krenair@tin Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/237529/ (duration: 00m 12s)
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:04 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/237529/ (duration: 00m 11s)
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]]
* 15:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234980/ (duration: 00m 12s)
* 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 13:38 godog: stop puppet on restbase-test2001 and turn up cassandra
* 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 12:56 bblack: rebooting lvs2006 to test eth hw params stuff...
* 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 12:55 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/238125/ (duration: 00m 13s)
* 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
* 12:50 urandom: starting Cassandra repair on restbase1003 (nodetool repair -pr)
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
* 12:32 godog: enable dc encryption on cassandra test cluster and rolling restart
* 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 11:33 mobrovac: citoid deploying d569951
* 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 10:35 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1002, es1005, es1008 (duration: 00m 12s)
* 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
* 10:04 jynus: db1029 (x1-master) temporarily saturated by connections- flow was unresponsive for 10 minutes; migration partially aborted
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 09:08 jynus: applying schema change to flowdb
* 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
* 08:52 godog: rename cassandra test cluster and restart
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 08:44 godog: silence mendelevium for today, status unclear T111532
* 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]])
* 08:30 jynus: endinf profiling and executing pt-query-digest on db1043 [ETA:4h]
* 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 07:52 godog: reboot ms-be1010 to pick up disk ordering change
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
* 04:48 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Sep 14 04:47:58 UTC 2015 (duration 47m 57s)
* 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
* 02:29 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-14 02:29:48+00:00
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
* 02:26 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 06m 59s)
* 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 01:31 Krinkle: mwscript deleteEqualMessages.php --wiki sqwiki
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
* 10:07 moritzm: installing ncurses security updates
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]]
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
* 07:11 moritzm: installing emacs security updates
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json


== 2015-09-13 ==
== 2023-05-18 ==
* 06:02 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep 13 06:02:52 UTC 2015 (duration 2m 51s)
* 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 02:40 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-13 02:40:43+00:00
* 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 02:34 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 10m 13s)
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
* 22:20 mutante: short down-time for zuul-merger on contint2001
* 21:47 mutante: maintenance for zuul (CI) on contint servers
* 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s)
* 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]]
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s)
* 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]]
* 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s)
* 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]]
* 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s)
* 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]]
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8  refs [[phab:T330215|T330215]]
* 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]]
* 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
* 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]]
* 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
* 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
* 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
* 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
* 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
* 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
* 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:18 TheresNoTime: closing backport window
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s)
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]]
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s)
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s)
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
* 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
* 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
* 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
* 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
* 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
* 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
* 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
* 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers ([[phab:T320508|T320508]])
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 ([[phab:T336483|T336483]])
* 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
* 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
* 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
* 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
* 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
* 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
* 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
* 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2
* 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
* 07:25 apergos: UTC morning backport and config training window done
* 07:15 kartik@deploy1002: Finished scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 18s)
* 07:07 kartik@deploy1002: kartik: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:06 kartik@deploy1002: Started scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl [[phab:T336833|T336833]]', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json
* 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
* 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance


== 2015-09-12 ==
== 2023-05-17 ==
* 20:15 ori: Rolling back Echo to 1.26wmf21 branch on mw1017 (testwiki) to measure increase in render-blocking CSS size
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:21 urandom: performing Cassandra repair on restbase1002 (nodetool repair -pr)
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
* 14:50 jynus: phab.wmfusercontent.org has been temporarily switched to phab.wikivoyage.org due to cert issues
* 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
* 04:52 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Sep 12 04:52:01 UTC 2015 (duration 52m 0s)
* 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 02:35 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-12 02:35:36+00:00
* 22:15 krinkle@deploy1002: Synchronized wmf-config/: [[phab:T332012|T332012]] (duration: 06m 51s)
* 02:32 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 06m 54s)
* 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
* 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request [[:phab:T333842{{!}}T333842]]"
* 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
* 20:32 urbanecm: UTC late B&C window done
* 20:29 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] (duration: 11m 36s)
* 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.
* 20:17 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]]
* 20:15 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] (duration: 12m 06s)
* 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
* 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
* 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
* 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]]
* 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
* 19:50 ejegg: payments-wiki upgraded from {{Gerrit|8988a598}} to {{Gerrit|a7567c6a}}
* 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update [[phab:T331297|T331297]]
* 19:01 Amir1: Removing db1112 from zarcillo [[phab:T336332|T336332]]
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet
* 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]] (duration: 06m 22s)
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s)
* 18:03 brennen: train 1.41.0-wmf.9 ([[phab:T330215|T330215]]): no current blockers, rolling to group1 as backup-backup conductor
* 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795]
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
* 17:19 brett: Maglev LVS scheduler rollout finished in esams - [[phab:T263797|T263797]]
* 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for [[phab:T244570|T244570]]
* 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json
* 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json
* 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json
* 15:52 brett: Rolling out maglev LVS scheduler in esams - [[phab:T263797|T263797]]
* 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json
* 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json
* 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
* 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
* 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json
* 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
* 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json
* 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
* 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 06m 20s)
* 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
* 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
* 14:36 moritzm: installing jackson-databind security updates
* 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] (duration: 00m 09s)
* 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]]
* 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]]
* 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
* 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
* 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
* 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
* 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
* 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
* 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - [[phab:T336817|T336817]]
* 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - [[phab:T336726|T336726]]
* 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs
* 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 07m 30s)
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
* 13:59 taavi@deploy1002: Finished scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] (duration: 07m 24s)
* 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
* 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
* 13:54 taavi@deploy1002: matmarex and taavi: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:52 taavi@deploy1002: Started scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]]
* 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
* 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
* 13:47 taavi@deploy1002: Finished scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] (duration: 08m 11s)
* 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs
* 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
* 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
* 13:40 taavi@deploy1002: taavi and maurelio: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 taavi@deploy1002: Started scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]]
* 13:38 taavi@deploy1002: Finished scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] (duration: 07m 39s)
* 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
* 13:32 taavi@deploy1002: stang and taavi: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:30 taavi@deploy1002: Started scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]]
* 13:29 taavi@deploy1002: Finished scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] (duration: 09m 15s)
* 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
* 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs
* 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
* 13:20 taavi@deploy1002: Started scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]]
* 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:18 daniel@deploy1002: Finished scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] (duration: 11m 52s)
* 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs
* 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary
* 13:07 daniel@deploy1002: daniel: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:06 daniel@deploy1002: Started scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]]
* 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
* 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json
* 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
* 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
* 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
* 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json
* 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
* 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
* 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json
* 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json
* 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json
* 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json
* 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json
* 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json
* 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json
* 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
* 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
* 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json
* 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
* 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json
* 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
* 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
* 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
* 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json
* 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
* 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
* 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
* 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
* 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json
* 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json
* 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json
* 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
* 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json
* 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json
* 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json
* 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json
* 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
* 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
* 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - [[phab:T336726|T336726]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json
* 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json
* 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json
* 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:48 moritzm: upgrading krb1001 to Bullseye [[phab:T331695|T331695]]
* 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
* 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468
* 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json
* 07:19 kartik@deploy1002: Finished scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] (duration: 07m 22s)
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json
* 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json
* 07:11 kartik@deploy1002: Started scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]]
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json
* 07:09 kartik@deploy1002: Backport cancelled.
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json
* 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
* 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
* 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
* 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json
* 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json
* 06:01 volans: restarted ferm on ms-be1047
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json
* 05:16 marostegui: Optimize s7 on dbstore1003 [[phab:T336733|T336733]]
* 00:21 krinkle@deploy1002: Synchronized src/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 01s)
* 00:15 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 14s)
* 00:07 krinkle@deploy1002: Synchronized lib/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 51s)


== 2015-09-11 ==
== 2023-05-16 ==
* 21:21 hashar: shutdown nodepool on labnodepool1001.eqiad.wmnet until monday
* 20:59 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] (duration: 07m 18s)
* 18:01 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/: Echo regression fixes #2 (duration: 00m 12s)
* 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:43 logmsgbot: krinkle@tin Synchronized php-1.26wmf22/resources/src/mediawiki/mediawiki.js: T112232 (duration: 00m 12s)
* 20:52 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]]
* 16:37 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/: Echo regression backports (duration: 00m 12s)
* 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 16:35 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/resources/src/mediawiki/mediawiki.js: resourceloader: Document internal mw.loader#jobs property (again) (duration: 00m 13s)
* 20:49 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] (duration: 09m 19s)
* 16:33 legoktm: ssh: connect to host mw1156.eqiad.wmnet port 22: Connection timed out
* 20:41 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:32 paravoid: powercycling mw1156, multiple kernel backtraces in console output
* 20:39 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]]
* 16:32 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/resources/src/mediawiki/mediawiki.js: resourceloader: Document internal mw.loader#jobs property (duration: 01m 07s)
* 20:36 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] (duration: 07m 44s)
* 16:15 cmjohnson1: mw1031 rebooting for f/w update
* 20:30 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 16:07 bblack: enabled LRO+GRO on lvs200[123], starting pybal there again ([456] testing looks good so far)
* 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - [[phab:T263797|T263797]]
* 15:45 bblack: enabled LRO+GRO on lvs200[456] (backups). Stopping pybal on lvs200[123] to test...
* 20:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]]
* 15:11 cmjohnson1: swapping pem2 cr2-eqiad
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:03 jynus: starting nodepool in labnodepool1001
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 09:21 jynus: starting profiling of phabricator db (db1043). Very low overhead.
* 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 06:03 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep 11 06:03:00 UTC 2015 (duration 2m 59s)
* 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox
* 02:41 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-11 02:41:24+00:00
* 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 02:34 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 11m 18s)
* 19:04 sukhe: dummry run of authdns-update to confirm new hosts
* 01:16 logmsgbot: ori@tin Synchronized php-1.26wmf22/extensions/TitleBlacklist: 9bf13dbe0b, 3203b045f7 (duration: 00m 12s)
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
* 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
* 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
* 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
* 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
* 18:47 ryankemper: [WDQS] Pooled `wdqs2012`
* 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled)
* 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": [[phab:T335777|T335777]]
* 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: [[phab:T326688|T326688]]
* 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": [[phab:T326688|T326688]]
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
* 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
* 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
* 18:01 sukhe: enable puppet on A:cp-text
* 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
* 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
* 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
* 17:40 moritzm: installing avahi security updates on buster
* 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s)
* 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937]
* 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
* 17:27 brett: Rolling out maglev LVS scheduler in drmrs - [[phab:T263797|T263797]]
* 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
* 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
* 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org
* 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" [[phab:T335777|T335777]]
* 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts
* 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
* 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help )
* 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container
* 15:49 sukhe: run authdns-update for CR 920314
* 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s)
* 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd]
* 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now [[phab:T336775|T336775]]
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: [[phab:T326688|T326688]]
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]
* 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
* 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
* 15:27 hashar: Restarting CI Jenkins
* 15:26 Emperor: rebalance codfw swift rings [[phab:T335280|T335280]]
* 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/
* 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:49 moritzm: installing libxml2 security updates on buster
* 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]]
* 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:43 hashar: Restarting CI Jenkins
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]]
* 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
* 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s)
* 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
* 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]]
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
* 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]]
* 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad
* 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 13:46 Emperor: repool ms-fe2012 [[phab:T335042|T335042]]
* 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web
* 13:32 taavi@deploy1002: Finished scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] (duration: 09m 08s)
* 13:32 Emperor: repool thanos-fe2003 [[phab:T335042|T335042]]
* 13:30 sukhe: running authdns-update to repool codfw
* 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
* 13:25 taavi@deploy1002: mazevedo and taavi: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]]
* 13:23 taavi@deploy1002: Started scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]]
* 13:01 XioNoX: asw-d-codfw> request system reboot all-members - [[phab:T335042|T335042]]
* 12:52 Emperor: depool ms-fe2012 [[phab:T335042|T335042]]
* 12:51 Emperor: depool thanos-fe2003 [[phab:T335042|T335042]]
* 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]]
* 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade
* 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade
* 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
* 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - [[phab:T336740|T336740]]
* 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s)
* 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2]
* 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s)
* 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
* 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]]
* 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]]
* 12:21 XioNoX: disable ping offload in codfw - [[phab:T335042|T335042]]
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s)
* 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
* 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:59 kart_: Updated cxserver to 2023-05-16-061239-production ([[phab:T336657|T336657]])
* 11:57 XioNoX: stage upgrade on asw-d-codfw - [[phab:T335042|T335042]]
* 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s)
* 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw
* 11:50 marostegui: install 10.4.29 on db1151 [[phab:T336462|T336462]]
* 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw
* 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2]
* 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm
* 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance
* 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance
* 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
* 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm
* 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm
* 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm
* 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 11:00 moritzm: updated bookworm image to RC3 [[phab:T330495|T330495]]
* 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
* 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]]
* 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet
* 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
* 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
* 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - [[phab:T317799|T317799]]
* 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]]
* 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 10:13 Amir1: cleaning up echo notification table in all wikis ([[phab:T318523|T318523]])
* 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s)
* 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided)
* 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner
* 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts
* 09:21 marostegui: Optimize s5 on dbstore1003 [[phab:T336733|T336733]]
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
* 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
* 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
* 07:28 Emperor: restart vopsbot.service on alert1001
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 06:56 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] (duration: 06m 58s)
* 06:51 marostegui@deploy1002: marostegui: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 06:50 eileen: civicrm: revision {{Gerrit|d97a371e}}, config {{Gerrit|686d3cb4}}
* 06:49 marostegui@deploy1002: Started scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]]
* 06:49 _joe_: running docker image prune -a in build2001
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json
* 06:24 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] (duration: 07m 08s)
* 06:24 eileen: civicrm upgraded from {{Gerrit|ef7b3822}} to {{Gerrit|d97a371e}}
* 06:18 marostegui@deploy1002: marostegui: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 06:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]]
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json
* 06:05 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] (duration: 07m 21s)
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json
* 05:59 marostegui@deploy1002: marostegui: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json
* 05:58 marostegui@deploy1002: Started scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]]
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json
* 05:43 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] (duration: 07m 15s)
* 05:38 marostegui@deploy1002: marostegui: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 05:36 marostegui@deploy1002: Started scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]] (duration: 48m 47s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]


== 2015-09-10 ==
== 2023-05-15 ==
* 23:52 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237064/ (duration: 00m 11s)
* 23:37 eileen: civicrm upgraded from {{Gerrit|db6e8d69}} to {{Gerrit|ef7b3822}}
* 23:47 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237056/ (duration: 00m 11s)
* 22:02 maryum: deployed patch for [[phab:T323651|T323651]]
* 23:13 logmsgbot: krenair@tin Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/221825 (duration: 00m 13s)
* 21:51 maryum: Deployed patch for [[phab:T335612|T335612]]
* 23:04 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/224771 (duration: 00m 12s)
* 21:42 ejegg: payments-wiki upgraded from {{Gerrit|c0da741f}} to {{Gerrit|8988a598}} (and globalcollect settings deleted)
* 21:13 logmsgbot: legoktm@tin Synchronized php-1.26wmf22/extensions/Echo/modules: Align popup footer buttons to take 50% width each (duration: 00m 15s)
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:50 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: depool es1001; increase weight of es1015 and es1019 (duration: 00m 19s)
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:47 ottomata: restarting eventlogging with 12 client side processors on eventlog1001
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:31 ottomata: turning off varnishncsa eventlogging eventlistener instances on frontend caches, it is now superseded by varnishkafka
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:28 mutante: killed/restarted ganglia aggregator process for mobile-cache, upload cache, misc esams ...
* 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 20:22 jynus: last SCAP failed on 266/466 hosts
* 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 20:21 mutante: killed/restarted ganglia aggregator process for text-caches esams on hooft
* 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 20:17 yurik: deployed kartotherian
* 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 20:08 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1001; increase weight of es1015 and es1019 (duration: 00m 11s)
* 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 19:11 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf22
* 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 19:09 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf22/extensions/CentralNotice: deploy https://gerrit.wikimedia.org/r/#/c/237458/ (duration: 00m 12s)
* 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 18:57 twentyafterfour: restarted phd on iridium
* 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 18:51 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf22/extensions/Wikidata: Deploy wikidata patch: https://gerrit.wikimedia.org/r/#/c/237449/ (duration: 00m 19s)
* 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance
* 18:23 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf22: deploy https://gerrit.wikimedia.org/r/#/c/237440/ (duration: 01m 42s)
* 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance
* 18:09 cmjohnson1: reseating pem2 cr2-eqiad
* 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 16:52 akosiaris: puppetswat done
* 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 16:50 mobrovac: restbase rolling restart of rb100x
* 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 16:49 mobrovac: restbase enabled puppet on rb100x
* 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 16:13 akosiaris: started puppetSWAT
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s)
* 16:10 logmsgbot: marktraceur@tin Finished scap: Make sure codfw got the last few patches sync'd to it (duration: 07m 36s)
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS
* 16:03 logmsgbot: marktraceur@tin Started scap: Make sure codfw got the last few patches sync'd to it
* 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 16:02 logmsgbot: marktraceur@tin Synchronized php-1.26wmf22/: [SWAT] [wmf22] Revert opera redirect loop fix that caused redirect loops in Firefox (duration: 02m 30s)
* 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 15:55 mobrovac: restbase disabled puppet on rb100x
* 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
* 15:45 logmsgbot: marktraceur@tin Synchronized php-1.26wmf22/extensions/UploadWizard/resources/transports/mw.FormDataTransport.js: [SWAT] [wmf22] Always set 'offset' with chunked uploads, even for first chunk (offset == 0) (duration: 02m 21s)
* 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 15:26 ottomata: started hadoop decomission of analytics1016
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
* 15:21 logmsgbot: marktraceur@tin Synchronized wmf-config/: [SWAT] Attempting another sync to mw2187 hoping it's up now (duration: 02m 22s)
* 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 15:05 logmsgbot: marktraceur@tin Synchronized wmf-config/: [SWAT] [config] Beta: Enable Content Translation suggestions (duration: 02m 22s)
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s)
* 13:35 moritzm: enabled ferm on mediawiki app servers in codfw
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 13:30 jynus: performing schema change and maintenance on officewiki and public all wikis with flow enabled
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 12:51 moritzm: enabled ferm on mediawiki API servers in codfw
* 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 12:36 moritzm: enabled ferm on mediawiki video scalers, image scalers and job runners in codfw
* 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s)
* 09:20 mobrovac: restbase deploying 0182962
* 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet
* 06:13 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep 10 06:13:14 UTC 2015 (duration 13m 13s)
* 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124
* 03:02 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-10 03:02:45+00:00
* 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - [[phab:T336434|T336434]]
* 02:59 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 06m 10s)
* 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] [[phab:T335042|T335042]]
* 02:51 logmsgbot: krenair@tin Synchronized php-1.26wmf22/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/237304 (duration: 00m 11s)
* 18:33 brett: Rolling out maglev LVS scheduler in eqsin - [[phab:T263797|T263797]]
* 02:50 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/237303 (duration: 00m 10s)
* 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
* 02:43 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-10 02:43:20+00:00
* 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 02:36 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 10m 45s)
* 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
* 02:24 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/resources/src/mediawiki/mediawiki.js: Ic0b1fb64ee7 backport (duration: 00m 12s)
* 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 01:04 logmsgbot: ori@tin Synchronized php-1.26wmf21/extensions/NavigationTiming: I2605c746b: Ensure timings are reported after the page has loaded (duration: 00m 13s)
* 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 01:03 logmsgbot: ori@tin Synchronized php-1.26wmf22/extensions/NavigationTiming: I2605c746b: Ensure timings are reported after the page has loaded (duration: 00m 12s)
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:54 mutante: powercycling unresponsive mw1154
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:30 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:29 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:27 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:27 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:26 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:15 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:15 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
* 14:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
* 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
* 14:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
* 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 14:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 13:56 volans: re-enabled puppet on the install hosts to deploy changes for [[phab:T336485|T336485]]
* 13:45 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 13:33 volans: disabling puppet on the install hosts to deploy changes for [[phab:T336485|T336485]]
* 13:00 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:58 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:58 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48228 and previous config saved to /var/cache/conftool/dbconfig/20230515-111624-ladsgroup.json
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48227 and previous config saved to /var/cache/conftool/dbconfig/20230515-110118-ladsgroup.json
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48226 and previous config saved to /var/cache/conftool/dbconfig/20230515-104611-ladsgroup.json
* 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48225 and previous config saved to /var/cache/conftool/dbconfig/20230515-103105-ladsgroup.json
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48224 and previous config saved to /var/cache/conftool/dbconfig/20230515-102038-ladsgroup.json
* 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
* 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
* 10:19 Amir1: Removing db1123 from zarcillo [[phab:T334910|T334910]]
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1123.eqiad.wmnet
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48223 and previous config saved to /var/cache/conftool/dbconfig/20230515-101329-ladsgroup.json
* 10:13 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 10:11 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1123.eqiad.wmnet
* 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48222 and previous config saved to /var/cache/conftool/dbconfig/20230515-095823-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1123 from dbctl [[phab:T334910|T334910]]', diff saved to https://phabricator.wikimedia.org/P48221 and previous config saved to /var/cache/conftool/dbconfig/20230515-095412-ladsgroup.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1123 [[phab:T334910|T334910]]', diff saved to https://phabricator.wikimedia.org/P48220 and previous config saved to /var/cache/conftool/dbconfig/20230515-094938-ladsgroup.json
* 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48219 and previous config saved to /var/cache/conftool/dbconfig/20230515-094317-ladsgroup.json
* 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15802
* 09:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15802
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48218 and previous config saved to /var/cache/conftool/dbconfig/20230515-092810-ladsgroup.json
* 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48217 and previous config saved to /var/cache/conftool/dbconfig/20230515-091139-ladsgroup.json
* 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
* 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
* 09:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 09:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 08:45 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:45 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:26 elukey: restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - [[phab:T335756|T335756]]
* 08:26 volans: installed spicerack_7.1.0 on cumin1001
* 08:22 volans: installed spicerack_7.1.0 on cumin2002
* 08:08 volans: uploaded spicerack_7.1.0 to apt.wikimedia.org bullseye-wikimedia
* 05:36 _joe_: building bookworm image for the first time [[phab:T335560|T335560]]


== 2015-09-09 ==
== 2023-05-12 ==
* 23:34 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 22:59 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 23:31 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 23:29 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:23 MaxSem: deployed Kartotherian config updates
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
* 23:23 logmsgbot: catrope@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 11s)
* 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
* 23:22 RoanKattouw: Running updateinterwikicache
* 22:32 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 23:13 logmsgbot: catrope@tin Synchronized php-1.26wmf22/extensions/WikimediaMaintenance: SWAT (duration: 00m 13s)
* 22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 23:13 logmsgbot: catrope@tin Synchronized php-1.26wmf22/extensions/Flow: SWAT (duration: 00m 32s)
* 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 23:12 logmsgbot: catrope@tin Synchronized php-1.26wmf21/extensions/WikimediaMaintenance: SWAT (duration: 00m 14s)
* 21:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS buster
* 23:12 logmsgbot: catrope@tin Synchronized php-1.26wmf21/extensions/Flow: SWAT (duration: 00m 29s)
* 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS buster
* 20:17 subbu: deployed parsoid version ffd0b444
* 20:08 mutante: gerrit1001 - systemctl mask gerrit [[phab:T326368|T326368]]
* 18:15 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf22
* 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 16:47 andrewbogott: systemctl stop nodepool on labnodepool1001
* 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 16:06 logmsgbot: aude@tin Synchronized database lists: Remove unused usagetracking.dblist (duration: 00m 12s)
* 18:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 16:01 logmsgbot: krenair@tin Synchronized robots.txt: https://gerrit.wikimedia.org/r/#/c/236200/ (duration: 00m 12s)
* 18:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 15:57 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236701/ - noop (duration: 00m 12s)
* 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1001']
* 15:56 ejegg: updated payments from from 4c5e30288370db926cbbf7a7528edb9c41c65716 to 9fc8ab40b7f70c7b588c2b9e7b5c94b1f893faa1
* 17:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001']
* 15:50 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237104/ (duration: 00m 12s)
* 17:59 sukhe: running authdns-update for CR 919388
* 15:46 logmsgbot: krenair@tin Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/237097/ (duration: 00m 12s)
* 17:31 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] (duration: 150m 34s)
* 15:46 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237097/ (duration: 00m 12s)
* 17:27 sukhe: set routing-options static route 208.80.153.240/28 [high-traffic2, codfw] next-hop 10.192.16.140: [[phab:T326767|T326767]]
* 15:43 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf21/resources/src/mediawiki/mediawiki.searchSuggest.js: Enable completion suggester AB experiment (duration: 00m 12s)
* 17:21 sukhe: restart pybal on lvs2012 to pick up bgp med change: [[phab:T326767|T326767]]
* 15:43 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf21/extensions/WikimediaEvents/: Enable suggester AB experiement (duration: 00m 11s)
* 17:11 sukhe: homer "cr*-codfw*" commit "Gerrit: 917924 add new LVS host lvs2012": [[phab:T326767|T326767]]
* 15:38 logmsgbot: krenair@tin Synchronized php-1.26wmf22/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/237091/ (duration: 00m 21s)
* 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 15:26 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234425/ (duration: 00m 12s)
* 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
* 15:21 logmsgbot: krenair@tin Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/236994/ (duration: 00m 12s)
* 16:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 15:15 bd808: Running sync-common manually on mw2187.codfw.wmnet. Host is missing l10n cache files
* 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
* 15:12 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236025/ (duration: 00m 11s)
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
* 15:10 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236042/ (duration: 00m 13s)
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
* 14:03 mutante: beginning mailman migration - expect lists to be down
* 16:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
* 13:14 moritzm: enabled ferm on test.wikipedia.org (mw1017)
* 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
* 13:05 urandom: issuing Cassandra repair on restbase1001 (nodetool repair -pr)
* 16:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
* 13:02 moritzm: enabled ferm on various initial mediawiki hosts in codfw: videoscaler (mw2007), appserver (mw200[89]), jobrunner (mw2081), api (mw2050), imagescaler (mw2086)
* 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
* 10:33 logmsgbot: aude@tin Synchronized wmf-config/CommonSettings.php: Remove unused usagetracking tag (duration: 00m 11s)
* 16:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
* 10:30 logmsgbot: aude@tin Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 12s)
* 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
* 10:26 logmsgbot: aude@tin Synchronized wmf-config/InitialiseSettings.php: rv usage tracking (duration: 00m 12s)
* 15:54 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
* 10:23 logmsgbot: aude@tin Synchronized usagetracking.dblist: Enable usage tracking on commons and test2wiki (duration: 00m 11s)
* 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
* 10:21 logmsgbot: aude@tin Synchronized wikidataclient.dblist: Sorted dblist (duration: 00m 12s)
* 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.