You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(krenair@mira Synchronized php-1.27.0-wmf.12/tests: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 08s) (logmsgbot))
imported>Stashbot
(Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 (T337446))
 
Line 1: Line 1:
== 2016-02-04 ==
== 2023-05-27 ==
* 01:04 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/tests: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 08s)
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]])
* 01:01 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.12/includes/parser: https://gerrit.wikimedia.org/r/#/c/268332/ (duration: 02m 25s)
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday
* 01:00 moritzm: rebooting iridium (phabricator host) for kernel update
* 00:03 tzatziki: removing 1 file for legal compliance
* 00:42 YuviPanda: yuvipanda@labstore2001:~$ sudo lvremove backup/maps20160121040005
* 00:41 YuviPanda: yuvipanda@labstore2001:~$ sudo lvremove backup/tools20160121020007
* 00:04 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.12


== 2016-02-03 ==
== 2023-05-26 ==
* 23:53 moritzm: repooling restbase2002 , depooling restbase2003 for kernel/Java update
* 23:48 tzatziki: removing 2 files for legal compliance
* 23:39 moritzm: repooling restbase2001 , depooling restbase2002 for kernel/Java update
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:36 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.12
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:29 hashar: passing wmf12 responsibility to thcipriani  . Crashing to bed myself.
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 23:22 moritzm: depooling restbase2001 for kernel/Java update
* 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 23:15 moritzm: rebooting wdqs1002 for kernel update
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:08 hashar: Full script of my deployment session is on mira.codfw.wmnet:/home/hashar/wmf12-deploy.script
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:07 logmsgbot: hashar@mira rebuilt wikiversions.php and synchronized wikiversions files: Clarify only testwiki and test2wiki are on php-1.27.0-wmf.12
* 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 23:07 moritzm: rebooting wdqs1001 for kernel update
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 22:51 hashar: test / test2 wikis are incredibly slow . Filled https://phabricator.wikimedia.org/T125727
* 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 22:47 subbu: finished deploying parsoid sha 98619f7f
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 22:43 logmsgbot: hashar@mira rebuilt wikiversions.php and synchronized wikiversions files: test2wiki to php-1.27.0-wmf.12
* 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]]
* 22:43 hashar: sync-wikiversions "test2wiki to php-1.27.0-wmf.12"
* 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 10s)
* 22:41 moritzm: repooling restbase1009
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 22:38 logmsgbot: hashar@mira Finished scap: to properly sync other master tin due to l10nupdate ui mismatch (duration: 24m 27s)
* 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
* 22:34 moritzm: repooling restbase1006 , depooling restbase1009 for kernel/Java update
* 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 22:34 hashar: Still looking at test.wikipedia.org being super "slow" . scap still rebuilding though
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:32 ejegg: updated payments-wiki from 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03 to fad669c99db8240b26a524aa70c85cfebd13a18c
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 22:21 moritzm: repooling restbase1005 , depooling restbase1006 for kernel/Java update
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 22:14 ejegg: rolled payments-wiki back to 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03
* 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 22:14 hashar: https://test.wikipedia.org/  switched to 1.27.0-wmf.12
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 22:13 logmsgbot: hashar@mira Started scap: to properly sync other master tin due to l10nupdate ui mismatch
* 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 22:13 subbu: restarted parsoid on wtp1002 as a canary
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 22:13 logmsgbot: hashar@mira Finished scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions) (duration: 20m 23s)
* 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 22:07 moritzm: repooling restbase1004 , depooling restbase1005 for kernel/Java update
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 22:06 ejegg: updated payments-wiki from 1817327b4b0919ebe26bbd8b9d84fac1bd7ddb03 to 52afbc735ef5d759fd42bef072bed286fe3a5581
* 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 22:06 subbu: starting parsoid deploy
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 22:03 mutante: mira, tin: find /srv/mediawiki-staging/ -uid 1001 -exec chown 10002 {} \;
* 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 21:53 hashar: reopened https://phabricator.wikimedia.org/T119165 ''l10nupdate user uid mismatch between tin and mira''
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 21:52 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions)
* 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 21:51 mutante: tin - find / -uid 1001 -exec chown 10002 {} \;
* 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
* 21:49 mutante: tin - fixing UID of l10nupdate user (T119165)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
* 21:45 moritzm: depooling restbase1004 for kernel/Java update
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 21:45 moritzm: repooling restbase1003
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 21:35 hashar: mismatching uid for l10nupdate user between mira and tin
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
* 21:34 logmsgbot: hashar@mira scap aborted: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions) (duration: 07m 41s)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 21:32 moritzm: depooling restbase1003 for kernel/Java update
* 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 21:27 moritzm: repooling restbase1008
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
* 21:26 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (with proper branches for special_extensions)
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
* 21:25 hashar: mira had to hard reset  CentralNotice / SemanticMediaWiki / SemanticResultFormats / Validator after we pointed them from master to their proper branch, submodule attempted a rebase automatically.. That is a no no
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:14 moritzm: depooling restbase1008 for kernel/Java update
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 21:08 hashar: waiting for the submodule patch https://gerrit.wikimedia.org/r/#/c/268214/ to land and will scap again
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 20:33 logmsgbot: hashar@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.bTBpxD6CuI" ' returned non-zero exit status 1 (duration: 01m 13s)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
* 20:32 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache (after RandomRootPage had a dummy entry point added)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 20:31 logmsgbot: demon@mira Synchronized php-1.27.0-wmf.12/extensions/RandomRootPage/: unbreak (duration: 01m 19s)
* 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 20:23 logmsgbot: hashar@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2188303825" --threads=10 --lang en  --quiet' returned non-zero exit status 255 (duration: 01m 49s)
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 20:21 logmsgbot: hashar@mira Started scap: testwiki to php-1.27.0-wmf.12 and rebuild l10n cache
* 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
* 20:20 hashar: Hacked wikiversions.json to only have testwiki on .12
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
* 19:58 logmsgbot: demon@mira Synchronized wmf-config/InitialiseSettings.php: touch (duration: 01m 19s)
* 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
* 19:49 logmsgbot: demon@mira Synchronized wmf-config/: fix wikibase/mobilefrontend config (duration: 01m 19s)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:48 robh: halting puppet on carbon for a few minutes to livehack a partition recipe change in netboot.cfg
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 19:45 hashar: https://phabricator.wikimedia.org/T125672  blocking wmf.12 "Notice: Undefined variable: wgMFQueryPropModules in /srv/mediawiki/wmf-config/Wikibase.php on line 120"
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 19:39 akosiaris: hot patch OTRS installation with https://github.com/OTRS/otrs/commit/c7ea6d64e02518e166fbac02f42f25dacad54342
* 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 19:35 hashar: mira: manually fixed /php and /w/static/current symlinks to point back to .10  (wikiversions migrated them to .11 which we skip)
* 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
* 19:30 moritzm: repooling restbase1002
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 19:29 hashar: Create patches to update wikiversions.json
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:24 hashar: Applying security patches on mira
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 19:24 hashar: starting train deployment of 1.27.0-wmf.12
* 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 19:09 csteipp: deployed patch for T125684
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
* 19:08 moritzm: depooling restbase1002 for kernel/Java update
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:38 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings-labs.php: Experiment one: Labs stripping HTML in beta (360e5af) (duration: 01m 19s)
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 18:34 moritzm: rebooting californium for kernel update
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:16 bblack: restarting pybal on lvs1001
* 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:04 jynus: previous announcement was for db2011, not db2010
* 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 18:02 jynus: starting slave IO thread on db2010
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 17:32 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1060 after maintenance (duration: 01m 20s)
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 17:22 mobrovac: restbase restarting rb1001
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 17:03 mdholloway: mobileapps deployed 68e38ec
* 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
* 17:02 bblack: restarting pybal on lvs1004 (not 1003!) T125397
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:02 bblack: restarting pybal on lvs1003 T125397
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 16:57 hashar: mira: updating /srv/mediawiki-staging/php-1.27.0-wmf.12  (prep deployment train)
* 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 16:55 logmsgbot: thcipriani@mira Synchronized wmf-config/CirrusSearch-production.php: SWAT: Return more like search queries to codfw [[gerrit:268097]] (duration: 01m 17s)
* 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 16:45 logmsgbot: thcipriani@mira Synchronized wmf-config/CommonSettings.php: SWAT: Remove unused/no longer existing item-create oauth grant [[gerrit:265447]] (duration: 01m 18s)
* 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
* 16:39 logmsgbot: thcipriani@mira Synchronized wmf-config: SWAT: Enable math data type on test wikidata + test wikipedias [[gerrit:268086]] (duration: 01m 18s)
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/mobile.php: SWAT: Remove section collapsing config [[gerrit:267776]] (duration: 01m 18s)
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 16:28 akosiaris: OTRS migration to 4.0 completed, starting upgrade to 5.0
* 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 16:24 logmsgbot: thcipriani@mira Synchronized wmf-config/CommonSettings.php: SWAT: MW parsoid URLs: s/parsoidcache/parsoid/ [[gerrit:267234]] (duration: 01m 18s)
* 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 16:18 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Add 2 sites to $wgCopyUploadsDomains [[gerrit:262893]] (duration: 01m 18s)
* 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 16:13 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Just use the default MobileFrontend specified page actions. Part II [[gerrit:267807]] (duration: 01m 18s)
* 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]]
* 16:11 logmsgbot: thcipriani@mira Synchronized wmf-config/mobile.php: SWAT: Just use the default MobileFrontend specified page actions. Part I [[gerrit:267807]] (duration: 02m 14s)
* 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
* 15:41 hashar: mira symlink pointing to current version got changed to wmf.11 by the checkoutMediaWiki script.   Manually changed to proper wmf.10 https://phabricator.wikimedia.org/T125475#1994078
* 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 15:32 jynus: restart and reconfigure mysql in db1060
* 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]]
* 15:30 hashar: MediaWiki 1.27.0-wmf.12, from 1.27.0-wmf.12, successfully checked out.
* 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 15:23 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1060 (duration: 00m 43s)
* 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 15:21 hashar: mira: cloning 1.27.0-wmf.12  (no link updates)
* 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 15:15 bblack: rebooting cp1060 (depooled/downtimed)
* 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 15:11 bblack: depooling cp1060 temporarily from cache_mobile varnish backends
* 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster  - [[phab:T329366|T329366]]
* 14:56 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1054 with low weight, repool db1067 with original weight (duration: 01m 22s)
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
* 14:50 bblack: rebooting cp1008 for kernel
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
* 14:28 godog: investigating uwsgi processes for graphite-web not coming up after reboot
* 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]]
* 14:10 moritzm: rebooting graphite1001 for kernel update
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
* 13:41 godog: powercycle ms-be2015
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
* 13:39 jynus: restarting and reconfiguring mysql at db1054
* 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
* 13:27 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1067 at low weight; depool db1054 (duration: 01m 16s)
* 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
* 11:45 jynus: restarting and reconfiguring mysql at db1067
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:11 moritzm: repooling restbase1001
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:04 akosiaris: OTRS database upgraded to 3.3, moving on with 4.0
* 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
* 11:00 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1063 at 100% load; depool db1067 for maintenance (duration: 01m 16s)
* 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 10:48 moritzm: depooling restbase1001 for kernel/Java update
* 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 10:37 _joe_: ending the load test on the eqiad apaches
* 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 10:11 moritzm: reboot francium for kernel update
* 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 09:53 jynus: m2 backup finished on /srv/backups/2016-02-03_08-51-06, filename 'db1020-bin.000842', position 220103947
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 09:50 moritzm: restarting neodymium for kernel update
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 09:49 _joe_: doing some basic load test on appservers in eqiad
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:52 akosiaris: stop otrs-daemon on mendelevium
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:51 jynus: starting mysql backup on db1020 (/srv/backups)
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
* 08:44 akosiaris: stop slave on db2011, db1020's (m2-master) slave, for OTRS migration. DO NOT ENABLE
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
* 08:40 akosiaris: stop exim4, cron, apache2 on iodine, mendelevium
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
* 08:39 akosiaris: disabling puppet on iodine, mendelevium, OTRS migration
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
* 08:24 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool db1063 with low weight (duration: 01m 20s)
* 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)


== 2016-02-02 ==
== 2023-05-25 ==
* 23:13 logmsgbot: demon@mira Finished scap: everything re-sync one more time for good measure (duration: 17m 04s)
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s)
* 22:56 logmsgbot: demon@mira Started scap: everything re-sync one more time for good measure
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:50 bblack: repooling scap proxies: mw10033, mw1070, mw1097, mw1216
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]]
* 22:45 chasemp: restart hhvm & apache2 on mw1235.eqiad.wmnet
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
* 22:44 _joe_: restarted hhvm on mw1231, stat_cache again
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
* 22:42 logmsgbot: demon@mira Finished scap: resync final batch with master (duration: 06m 48s)
* 20:47 TheresNoTime: close UTC late backport
* 22:35 logmsgbot: demon@mira Started scap: resync final batch with master
* 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s)
* 22:31 logmsgbot: demon@mira Finished scap: re-sync batch of mw1136-50, mw1190-1220, mw2150-mw2200 with master (duration: 09m 33s)
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 22:22 logmsgbot: demon@mira Started scap: re-sync batch of mw1136-50, mw1190-1220, mw2150-mw2200 with master
* 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]]
* 22:20 ori: restarted HHVM on mw1243. Lock-up. Backtrace in /tmp/hhvm.2897.bt
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s)
* 22:20 logmsgbot: demon@mira Finished scap: re-sync batch of mw1101-1135,1240-1260, 2101-2150 with master (duration: 12m 51s)
* 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:07 logmsgbot: demon@mira Started scap: re-sync batch of mw1101-1135,1240-1260, 2101-2150 with master
* 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]]
* 22:00 logmsgbot: demon@mira Finished scap: re-sync batch of mw1151-mw1225, mw2174-mw2214 with master (duration: 11m 24s)
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s)
* 21:49 logmsgbot: demon@mira Started scap: re-sync batch of mw1151-mw1225, mw2174-mw2214 with master
* 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:45 logmsgbot: demon@mira Finished scap: re-sync batch of mw1051-1100, mw2051-2100 with master (duration: 13m 41s)
* 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]]
* 21:31 logmsgbot: demon@mira Started scap: re-sync batch of mw1051-1100, mw2051-2100 with master
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:28 logmsgbot: demon@mira Finished scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (2nd try) (duration: 14m 33s)
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 21:27 _joe_: depooling eqiad scap-proxies
* 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 21:13 logmsgbot: demon@mira Started scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (2nd try)
* 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 21:04 logmsgbot: demon@mira scap aborted: re-sync batch of mw1025-1050 and mw2007-mw2050 with master (duration: 10m 11s)
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
* 20:54 logmsgbot: demon@mira Started scap: re-sync batch of mw1025-1050 and mw2007-mw2050 with master
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
* 20:32 hashar: mw1114-mw1119 are canary api appservers  Finished syncing
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
* 20:28 ori: restarted hhvm on mw1116
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
* 20:17 hashar: Running sync-common on mw1114-mw1119 (canary api appservers)
* 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
* 20:16 ostriches: mira: removed untracked wmf-config/x.php testing file
* 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
* 20:11 ori: Running sync-common on canary app servers (mw1017-mw1025)
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
* 19:46 hashar: Running sync-common on mw1260 (video scaler)
* 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
* 19:40 ori: Running sync-common on all jobscalers
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
* 19:35 ori: Running sync-common on mw1259 (video scaler) and mw1153 (image scaler) too
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
* 19:29 ori: Running sync-common on mw100[123]
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
* 18:59 _joe_: running sync-common on mw1020
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
* 18:54 _joe_: repooled mw1119
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
* 17:45 hashar: mira /srv/mediawiki-staging  git submodule update --init --recursive
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
* 17:43 hashar: mw1119 sync-common
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
* 17:37 godog: disable unused swift container-sync for wikibooks-ka-local-thumb wikibooks-hr-local-thumb wikibooks-km-local-thumb wikibooks-sk-local-thumb wikibooks-tr-local-thumb wikipedia-it-local-thumb.fc
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
* 17:36 hashar: mw1119:/srv/mediawiki/wmf-config/event-schemas is empty 
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 _joe_: depooled mw1119, partial sync
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 16:59 hashar: files were /srv/mediawiki/docroot/wikimedia.org/WikipediaMobileFirefoxOS/.git and /srv/mediawiki/docroot/wikimedia.org/WikipediaMobileFirefoxOS/js/lib/MobileFrontend/.git
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 16:58 ostriches: mw1017: removed stray .git directory from WikipediaFirefoxMobileOS or w/e. It shouldn't be there anyway. sync-common is happy again on it
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
* 16:48 hashar: tin /srv/mediawiki-staging  : running git submodule update --init --recursive
* 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:47 hashar: tin /srv/mediawiki-staging  : running git submodule update --init
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
* 16:40 hashar: mw1017 sync-common --verbose
* 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 16:35 _joe_: sync-common on mw2030 and mw1161; re-enable puppet, jobrunner, jobchron on mw1161
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 16:34 _joe_: restarted puppet and rsync on both tin and mira, removed comments on the l10nupdate job on tin
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 16:23 logmsgbot: thcipriani@mira rebuilt wikiversions.php and synchronized wikiversions files: rebuild wikiversion.php
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 14:57 godog: disable swift container-sync for wikipedia-it-local-public.a7
* 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.10 php-1.27.0-wmf.10
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.9 php-1.27.0-wmf.9
* 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 14:43 hashar: tin /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.8 php-1.27.0-wmf.8
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 14:21 hashar: starting rebuilding /srv/mediawiki-staging from scratch on tin (not mira)
* 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 14:20 hashar: starting rebuilding /srv/mediawiki-staging from scratch on mira
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 14:04 bblack: nevermind, not looking at eeden
* 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 14:04 bblack: looking at eeden
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 13:58 moritzm: rebooting eeden for kernel update
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]])
* 13:09 moritzm: rolling reboot of scb* (for kernel update)
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
* 13:02 akosiaris: reboot dubnium for kernel upgrades
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
* 13:01 akosiaris: reboot pollux for kernel upgrades
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 12:45 moritzm: rebooting baham for kernel update
* 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 12:20 _joe_: stopping rsync on mira too, to avoid accidental deploys
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 12:15 _joe_: stopped puppet on mira, added a big warning in the motd
* 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 12:15 _joe_: stopped rsync, puppet, l10nupdate cronjob on tin
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 12:06 _joe_: stopped rsync on tin to avoid problems
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 11:38 moritzm: rolling reboot of aqs* (for kernel update)
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 11:24 hashar_: Restarting Zuul. Stuck in a dependency loop :(
* 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 11:12 jynus: restarting and reconfiguring mysql at db1063
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
* 10:51 _joe_: stopped jobrunner on mw1161 after failed sync-common
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 10:44 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1063, repool db1036 (duration: 00m 21s)
* 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 10:00 jynus: reconfigure and upgrade db1036
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 09:51 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Testing scap-reduce db1018 weight (duration: 00m 21s)
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
* 09:42 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1036, repool db1021 (duration: 00m 22s)
* 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
* 09:38 hashar: Jenkins is fully up and operational
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
* 09:36 jynus: armed keyholder on tin
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
* 09:34 dcausse: elastic (codfw and eqiad): unfreezing indices
* 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 09:33 moritzm: restarting gerrit on ytterbium for java security update
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 09:33 _joe_: re-syncing tin homes
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
* 09:32 hashar: gallium: apt-get upgrade  | Restarting Jenkins
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
* 09:12 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1036, repool db1021 (duration: 00m 21s)
* 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
* 09:08 dcausse: elastic (codfw and eqiad): freezing indices to stop titlesuggest maint scripts
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
* 09:03 godog: repool restbase1007 via confctl
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
* 08:13 jynus: restarting and upgrading db1021
* 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 08:02 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Pool db1018; Depool db1021 (duration: 00m 20s)
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 07:46 jynus: https://phabricator.wikimedia.org/rOMWC2ea9167221d11eb1880e4d26eae64a85cb9b2697 and https://phabricator.wikimedia.org/rOMWCa55d2bf8cd3a2853fac35d5b8239b8e8c2fe6a0f merged but not deployed
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 06:58 _joe_: reimaging tin.eqiad.wmnet
* 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 01:30 logmsgbot: ebernhardson@mira Finished scap: Add Cookie statement link to footer of all WMF wikis per legal (duration: 19m 42s)
* 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 01:11 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
* 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s)
* 01:07 logmsgbot: ebernhardson@mira scap failed: CalledProcessError Command '/srv/deployment/scap/scap/bin/refreshCdbJsonFiles --directory="/srv/mediawiki-staging/php-1.27.0-wmf.10/cache/l10n" --threads=10 ' returned non-zero exit status 255 (duration: 03m 31s)
* 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 01:03 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
* 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 00:31 logmsgbot: ebernhardson@mira scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="cawikibooks" --outdir="/tmp/scap_l10n_1684485672" --threads=10  --quiet' returned non-zero exit status 255 (duration: 02m 35s)
* 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 00:30 mobrovac: restbase deploy end of c3bd864
* 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 00:29 logmsgbot: ebernhardson@mira Started scap: Add Cookie statement link to footer of all WMF wikis per legal
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
* 00:26 logmsgbot: ebernhardson@mira Synchronized wmf-config/logging.php: Revert "monolog: Ensure that context data added by WebProcessor is utf-8 safe" (duration: 01m 27s)
* 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s)
* 00:23 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: Move morelike query load back to eqiad to allow load testing on codfw (duration: 01m 38s)
* 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]]
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s)
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s)
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]]
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s)
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:15 jbond: update udplog on mwlog server
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
* 11:09 jbond: upload udplog_1.10_amd64.deb
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]])
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000  to dumpsdata1006, copying all public xml dumps data
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]]
* 07:52 matthiasmullie: UTC morning backports done
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s)
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]]
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}


== 2016-02-01 ==
== 2023-05-24 ==
* 23:51 mobrovac: restbase deploy start of c3bd864 on canary rb1001
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s)
* 19:28 logmsgbot: ori@mira Synchronized docroot/wikipedia.org/speed-tests: I5b48a491390: Speed trials: add preconnect (duration: 01m 27s)
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:54 bblack: banned obj.http.Content-Length == 13817 on all cache_text
* 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]]
* 18:54 mutante: LDAP - added elukey to "ops" group
* 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s)
* 18:11 mutante: planet1001 - rebooting for upgrade
* 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 17:54 hoo: restarted hhvm on mw1253
* 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 17:06 logmsgbot: thcipriani@mira Synchronized wmf-config: SWAT: Use extension registration for Graph [[gerrit:266433]] (duration: 01m 29s)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s)
* 16:59 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink on or.wikipedia.org [[gerrit:267194]] (duration: 01m 31s)
* 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:54 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WikidataPageBanner on es.wikivoyage [[gerrit:267195]] (duration: 01m 29s)
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 16:52 _joe_: restarted pybal on lvs1001
* 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:47 _joe_: installing the new HHVM package to the api appserver cluster in eqiad
* 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:38 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set WikidataPageBanner namespaces on fr.wikivoyage [[gerrit:266541]] (duration: 01m 26s)
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:32 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on cu.wikipedia [[gerrit:265885]] (duration: 01m 26s)
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:26 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Centralise all VisualEditor feedback pages except for a few wikis [[gerrit:258206]] (duration: 01m 30s)
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:22 logmsgbot: thcipriani@mira Synchronized dblists/visualeditor-default.dblist: SWAT: Enable VisualEditor by default for some other wikis [[gerrit:264765]] (duration: 01m 58s)
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:05 ema: hhvm restarted on mw1072
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:54 logmsgbot: krenair@mira Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 01m 52s)
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:48 bblack: restarted pybal on lvs1004 (lvs1003 above was a bad log message!)
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:42 bblack: restarted pybal on lvs1003
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:13 bblack: cp3042 repooled
* 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 15:10 ema: restarting hhvm on mw1057
* 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]]
* 14:33 chasemp: labstore1002 cfg scheduling
* 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 14:04 godog: set ms-be1019 swift weight to 4000
* 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 13:33 moritzm: rolling reboot of xenon/cerium/praseodymium for kernel update (and updating to new openjdk-8)
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:40 _joe_: depooling cp3042 from esams uploads
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:15 _joe_: backing up tin homes before reimaging
* 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:59 moritzm: rolling reboot of ms-be1016 to ms-be1021 for kernel update
* 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:39 moritzm: uploaded openjdk-8 8u72-b15-1~bpo8+1 for jessie-wikimedia to carbon
* 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:34 moritzm: uploaded openssl 1.0.2f for jessie-wikimedia to carbon
* 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}}
* 11:19 godog: repool restbase1007
* 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s)
* 10:32 godog: reboot ms-be1010, xfs
* 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance
* 10:27 jynus: partitioning revision and logging for db2037 and db2044 (s4)
* 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]]
* 00:04 logmsgbot: tstarling@mira Synchronized php-1.27.0-wmf.11/includes: (no message) (duration: 01m 31s)
* 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s)
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]]
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]]
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}}
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}}
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:18 aqu: analytics-refinery, about to deploy
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s)
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s)
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]]
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]]
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s)
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]]
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s)
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]]
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s)
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]]
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s)
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]]
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s)
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]]
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]]
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]]
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]]
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s)
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]]
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
* 01:19 mutante: contint2001 - jenkins started again
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:45 mutante: short maintenance on main contint server (jenkins)
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance


== 2016-01-31 ==
== 2023-05-23 ==
* 23:58 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/VisualEditor/extension.json: https://gerrit.wikimedia.org/r/#/c/267617/ (duration: 01m 28s)
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
* 22:31 ori: restarted parsoid-rt-client.service
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 22:14 ori: Updated parsoid on ruthenium and restarted parsoid-rt-client on ruthenium, per subbu's request.
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 22:03 bd808: backfilled missing data in https://tools.wmflabs.org/sal/production from https://wikitech.wikimedia.org/wiki/Server_Admin_Log
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
* 21:37 bd808: https://tools.wmflabs.org/sal/production missing data from 2016-01-30 until now
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]]
* 21:33 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.10/includes/jobqueue/jobs/HTMLCacheUpdateJob.php: Live-hacked wfDebugLog() call for T124418 (duration: 01m 31s)
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}}
* 16:01 tgr: changed wikiversions.php on mw1017 to serve wmf.10 for SessionManager-related debugging
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}}
* 05:35 legoktm: restarted extensions/CentralAuth/maintenance/resetGlobalUserTokens.php
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 31 02:32:12 UTC 2016 (duration 7m 11s)
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 10m 14s)
* 21:02 TheresNoTime: close UTC late backport window
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s)
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]]
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s)
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}}
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]]
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]]
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]]
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]]
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s)
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s)
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]]
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]]
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]]
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s)
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]]
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s)
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]]
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s)
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]]
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s)
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s)
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]])
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]] (duration: 49m 04s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}}
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}}


== 2016-01-30 ==
== 2023-05-22 ==
* 23:20 logmsgbot: bd808@mira rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10 (again)
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}}
* 23:01 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Revert Enable debug level session logging to fluorine (17bfb06) (duration: 01m 28s)
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s)
* 22:36 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Enable debug level session logging to fluorine (5ac9412) (duration: 01m 26s)
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:43 _joe_: updated visualdiff, restarted parsoid-vd
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]]
* 13:00 godog: discard preserved cache on ms-be2003, powercycle
* 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]]
* 03:40 Krenair: Deleted old /srv/mediawiki/php-1.27.0-wmf.[1-5] directories across the cluster to match the deployment tree, T124567
* 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 30 02:31:56 UTC 2016 (duration 7m 2s)
* 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 24s)
* 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 00:08 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Remove proposed fix for T125267 (duration: 01m 33s)
* 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
* 20:27 TheresNoTime: close UTC late backport window
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s)
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]]
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s)
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]]
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
* 16:58 XioNoX: push mgmt_junos to all L2 switches
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]]
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:02 moritzm: installing updated usb.ids packages for Bullseye
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]]
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
* 08:22 moritzm: installing systemd security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json


== 2016-01-29 ==
== 2023-05-21 ==
* 23:53 jynus: restarted db1018 replication (and its codfw slaves) after a (somewhat) failed maintenance
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 23:41 mutante: ruthenium - restart parsoid-rt-client, parsoid-vd-client
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 23:37 mutante: ruthenium - git pull origin in /srv/visualdiff/
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 23:22 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Testing proposed fix for T125267 (duration: 01m 26s)
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 22:52 jynus: powercycling cp3042 to test it is really the broken one
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 22:37 jynus: powercycle cp3049, not 42
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 22:37 jynus: powercycle cp3042
* 22:27 mutante: cp3042 -  md0: unknown partition table
* 22:23 mutante: powercycled cp1049
* 22:06 mutante: powercycle cp3049
* 21:13 mutante: bromine - stop and remove rsync service
* 20:16 logmsgbot: aaron@mira Synchronized wmf-config/CommonSettings.php: Use the logical redis definition for GettingStarted (duration: 01m 26s)
* 19:36 jynus: reinstall db1018
* 18:11 jynus: creating special partitioning for db2037 and db2044 (ETA:5 days, lag)
* 18:01 jynus: creating special partitioning for db2034 and db2042 (ETA:5 days, lag)
* 17:51 logmsgbot: bd808@mira Synchronized wmf-config/InitialiseSettings.php: Stop the first survey in fawiki and eswiki (f89621d) (duration: 01m 25s)
* 17:44 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/api/ApiMain.php: Log user-agents that are using HTTP when HTTPS is preferred (55ac0b7) (duration: 01m 26s)
* 17:41 logmsgbot: bd808@mira Synchronized wmf-config/CommonSettings.php: Grant autocreateaccount to anons on loginwiki (d916008) (duration: 01m 27s)
* 17:39 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: CentralAuth: Take auto-creation into account (f526ef1) (duration: 01m 28s)
* 17:35 logmsgbot: bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: SessionManager: Save user name to metadata even if the user doesn't exist locally (a39b4ac) (duration: 01m 29s)
* 17:01 jynus: restarting mysql at db1018
* 16:50 robh: parsoid-vd restart was due to subbu irc request (i wasnt just randomly restarting things ;)
* 16:47 robh: restarting parsoid-vd & parsoid-vd-client on ruthenium
* 16:33 ottomata: uinstalling impala in analytics cluster
* 15:45 bblack: upgrade packages (incl kernel) on eqiad caches hosts (cp1xxx)
* 15:37 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool db1018 for maintenance (duration: 01m 49s)
* 15:32 akosiaris: remove all networking configuration from asw-b-eqiad switch for nas1001-a, nas1001-b. Leave just descriptions
* 15:21 bblack: upgrading packages (incl kernel) on esams cache hosts (cp3xxx) (codfw, ulsfo already done)
* 15:11 akosiaris: powering off nas1001-a.eqiad.wmnet. https://phabricator.wikimedia.org/T124156
* 15:08 akosiaris: powering off nas1001-b.eqiad.wmnet. https://phabricator.wikimedia.org/T124156
* 15:01 elukey: re-enabled puppet on analytics1027
* 14:39 elukey: stopped kafka (service) on kafka1012 (the host that caused the outage)
* 14:24 moritzm: rebooting bohrium for kernel update
* 14:04 _joe_: installing the new hhvm package on all the codfw appserver
* 13:43 _joe_: installing the new HHVM package to the canary appservers (main and api)
* 12:30 paravoid: force-rebooting pollux
* 11:43 _joe_: uploaded hhvm_3.6.5+dfsg1-1+wm8 to trusty-wikimedia
* 11:22 moritzm: rolling restart of swift in codfw
* 11:14 elukey: disabled puppet on analytics1027 due to issues with Camus and HDFS
* 10:17 moritzm: rolling restart of swift in esams
* 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Jan 29 02:32:56 UTC 2016 (duration 7m 28s)
* 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 40s)
* 01:31 logmsgbot: ori@mira Synchronized wmf-config: I83da57cf: Enable persistent redis connections for job runners (duration: 01m 11s)
* 01:03 logmsgbot: krenair@mira Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/267186/ (duration: 01m 09s)
* 01:01 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/265292/ (duration: 01m 14s)
* 00:57 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267071/ (duration: 01m 11s)
* 00:53 logmsgbot: krenair@mira Synchronized wmf-config/CirrusSearch-production.php: https://gerrit.wikimedia.org/r/#/c/266995/ (duration: 01m 11s)
* 00:50 yurik: synced latest graphoid
* 00:49 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/MobileFrontend/resources/skins.minerva.editor/init.js: https://gerrit.wikimedia.org/r/#/c/267168/ (duration: 01m 12s)
* 00:45 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 10s)
* 00:43 logmsgbot: krenair@mira Synchronized wmf-config/CirrusSearch-common.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 10s)
* 00:42 logmsgbot: krenair@mira Synchronized tests/cirrusTest.php: https://gerrit.wikimedia.org/r/#/c/267053/ (duration: 01m 11s)
* 00:35 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267025/ (duration: 01m 12s)
* 00:25 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Graph/modules/graph2.js: https://gerrit.wikimedia.org/r/#/c/267065/ (duration: 01m 11s)
* 00:17 logmsgbot: krenair@mira Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/267060/ (duration: 01m 12s)
* 00:02 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267189/2 (duration: 01m 11s)


== 2016-01-28 ==
== 2023-05-20 ==
* 23:51 mutante: caesium - stop puppet, shutdown server, remove from icinga, clean puppet cert ...
* 18:25 effie: restart varnish cp3061
* 23:46 Tim: on ruthenium installing build dependencies and compiling uprightdiff for test
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
* 23:20 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.11/includes/api/ApiStashEdit.php: Ia4196eba9: Add ParserOutputStashForEdit hook for extension cache warming (duration: 01m 10s)
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s)
* 23:17 logmsgbot: tgr@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionManager.php: T125161 (duration: 01m 11s)
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:58 ottomata: restoring MobileWebSectionUsage_14321266 from db1047 to dbstore1002 using mysqlimport
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]]
* 22:23 bblack: starting cache_mobile->cache_text conversion in eqiad - https://phabricator.wikimedia.org/T109286
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
* 22:09 bblack: eqiad pybal->etcd conversion done
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:01 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11/extensions/WikimediaEvents/WikimediaEventsHooks.php: deploying fix for T125151 (duration: 01m 15s)
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 21:59 mutante: releases.wm.org - switched backend to bromine
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 21:58 bblack: converting active eqiad LVS/pybal to etcd
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:56 mutante: caesium - stopped apache
* 21:31 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.11/extensions/AbuseFilter: I13fcc3ce4: Updated mediawiki/core Project: mediawiki/extensions/AbuseFilter  19baa3b6e51b8fe6baf6e3ce7e590060e8e6eec9 (duration: 01m 11s)
* 21:27 bblack: converting backup/inactive eqiad LVS/pybal to etcd
* 21:16 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.11
* 20:54 mutante: sca1001 - stop mathoid,graphoid,citoid
* 20:52 mutante: sca1002 - stop mathoid,graphoid,citoid
* 20:50 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11: syncing 1.27.0-wmf.11 for T125114 and https://gerrit.wikimedia.org/r/#/c/267128/ (duration: 03m 30s)
* 20:25 bblack: depool -> reboot cp4008 (ulsfo text, trying new kernel with live traffic)
* 20:00 bblack: depool -> reboot cp4011 (ulsfo mobile, currently unused for traffic - testing local conftool-scripts depool + new kernel)
* 19:55 logmsgbot: ori@mira Synchronized wmf-config: Iea2573ccfbe: Revert "Autopromotion: remove deprecated onView event, fix INGROUPS" (duration: 02m 13s)
* 19:43 ori: added tgr and marxarelli to security group on phab
* 19:26 ottomata: kafka preferred-replica-election to rebalanace analytics-eqiad brokers
* 18:22 elukey: rebooting analytics1001 for new kernel upgrade
* 18:21 yurik: deployed graphoid
* 17:43 elukey: rebooting analytics1002.eqiad.wmnet (Hadoop master's slave) for kernel upgrade
* 17:39 urandom: finished deploying configuration change (https://gerrit.wikimedia.org/r/266299) to restbase staging
* 17:38 robh: neglected to log i ifinished icinga/neon updates and its back to normal service (never interrrupted)
* 17:38 urandom: restarting restbase on restbase200[1-3].codfw.wmnet (restbase staging)
* 17:34 urandom: forcing puppet run on restbase200[1-3].codfw.wmnet (restbase staging)
* 17:30 urandom: forcing puppet run on praseodymium.eqiad.wmnet, and restarting restbase (staging env)
* 17:27 urandom: restarting restbase on xenon.eqiad.wmnet (restbase staging)
* 17:25 urandom: forcing puppet run on xenon.eqiad.wmnet (restbase staging)
* 17:21 urandom: restarting restbase on cerium.eqiad.wmnet
* 17:18 urandom: forcing puppet run on cerium.eqiad.wmnet (restbase staging)
* 17:18 robh: pushing icinga updates (shouldnt affect service but others shouldnt also try to update neon right now)
* 17:17 logmsgbot: krenair@mira Synchronized README: testing (duration: 02m 08s)
* 17:15 urandom: disabling pupplet on restbase staging hosts
* 17:01 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266957/ (duration: 02m 15s)
* 16:52 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/267040/ (duration: 02m 13s)
* 16:48 cmjohnson1: mw1172, mw1178,mw1217, mw1257 powering off task# T124642
* 16:45 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264219/ (duration: 02m 12s)
* 16:42 logmsgbot: krenair@mira Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264219/ (duration: 02m 12s)
* 16:37 Krenair: Downloaded and `chmod +x`'d mira:/srv/mediawiki-staging/.git/hooks/commit-msg
* 16:29 mdholloway: mobileapps deployed 7583148, reverting in part 869ec35
* 16:25 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: rv (duration: 02m 10s)
* 16:25 bblack: upgrading packages (incl kernel) on all codfw caches
* 16:19 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266955/ (duration: 02m 14s)
* 16:13 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266564/ (duration: 02m 12s)
* 16:05 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264733/ (duration: 02m 11s)
* 15:39 bblack: kafka1012 booted up normally
* 15:39 mdholloway: mobileapps deployed 869ec35
* 15:37 bblack: rebooting kafka1012
* 15:36 bblack: kafka1012: manually edited fstab, s/sdb1/sdb3/, s/sdc3/sdc1/, and now the filesystems mount and data looks right
* 15:23 bblack: powering up kafka1012
* 14:09 moritzm: rebooting serpens/seaborgium for kernel update
* 13:58 logmsgbot: faidon@mira Synchronized wmf-config/InitialiseSettings.php: depool kafka1012 (duration: 02m 10s)
* 13:31 bblack: citoid and cxserver public hostnames moving to cache_text
* 12:59 moritzm: rebooting rutherfordium (peopleweb) for kernel update
* 12:53 elukey: stopping kafka on kafka1012 + host reboot for kernel upgrade
* 12:23 jynus: generating empty schema for new codfw parsercaches
* 12:14 logmsgbot: jynus@mira Synchronized wmf-config/db-codfw.php: New parsercache servers for codfw datacenter (duration: 03m 10s)
* 12:11 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: New parsercache servers for codfw datacenter (duration: 02m 15s)
* 12:07 jynus: pooling new parsercaches for codfw datacenter
* 12:01 moritzm: powercycled mw1163, was unreachable after reboot of the jobrunners (but now up again after powercycle via mgmt)
* 11:31 elukey: disabled puppet on analytics1027 due to some issues with camus and hdfs
* 10:42 moritzm: rebooted parsoid systems in codfw for kernel update, rolling reboot for eqiad
* 10:39 _joe_: rolling reboot of jobrunners in eqiad
* 02:46 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 06m 16s)
* 02:41 logmsgbot: tgr@mira Synchronized php-1.27.0-wmf.11/includes/: deploy SessionManager patch for T124971: gerrit 266944, 266946 (duration: 03m 20s)
* 02:27 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 10m 21s)
* 01:03 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264460/ (duration: 02m 30s)
* 00:58 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264066/ (duration: 02m 26s)
* 00:46 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Gather/resources: https://gerrit.wikimedia.org/r/#/c/266793/ and https://gerrit.wikimedia.org/r/#/c/266792/ (duration: 02m 23s)
* 00:41 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.11/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/266939/ (duration: 02m 27s)
* 00:27 logmsgbot: krenair@mira Synchronized php-1.27.0-wmf.10/extensions/Flow/includes: https://gerrit.wikimedia.org/r/#/c/266938/ (duration: 02m 29s)
* 00:09 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/266945/ (duration: 02m 36s)


== 2016-01-27 ==
== 2023-05-19 ==
* 22:36 robh: restarting parsoid-rt-client service on ruthenium
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:29 ottomata: starting mysqldump of MobileWebSectionUsage_14321266 from db1047 into m4-master
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 21:45 yurik: updated graphoid on scb*
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 21:29 mdholloway: mobileapps deployed 6f35859
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 21:26 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
* 21:26 logmsgbot: ori@mira Synchronized docroot and w: (no message) (duration: 02m 26s)
* 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
* 19:48 YuviPanda: started nfs-exports daemon on labstore1001, had been dead for a few days
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
* 19:32 mutante: stat1002 - redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out.
* 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
* 19:31 mutante: stat1002 - running puppet, was reported as last run about 4 hours ago but not deactivated
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
* 19:14 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11
* 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
* 19:07 ejegg: set donation queue consumer time limit back to 90 sec
* 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
* 18:49 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool pc1006 after cloning (duration: 02m 25s)
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]]
* 18:48 bd808: HHVM on mw1019 still dying on a regular basis with "Lost parent, LightProcess exiting"
* 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s)
* 18:00 csteipp: deploy patch for T103239
* 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:50 csteipp: deploy patch for T97157
* 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]]
* 17:47 jynus: migrating ruthenium parsoid-test database to m5-master
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s)
* 17:27 elukey: rebooting analytics105* hosts to upgrade their kernel
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 17:16 elukey: rebooting analytics1035.eqiad.wmnet for kernel upgrade
* 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
* 16:23 ejegg: updated SmashPig from 072c7ec6ed94e7074ba35b7986d5dde94866fe2f to 97629339994bffe8831a9067f5e9c21fa423586b
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:22 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/CentralAuthUtils.php: SWAT: Preserve certain keys when updating central session [[gerrit:266672]] (duration: 02m 28s)
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]]
* 16:11 logmsgbot: thcipriani@mira Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: SWAT: Avoid forceHTTPS cookie flapping if core and CA are setting the same cookie [[gerrit:266671]] (duration: 02m 26s)
* 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 16:03 elukey: rebooting analytics 1043 -> 1050 for kernel upgrade.
* 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 15:47 elukey: rebooting analytics 1026, 1040 -> 1042 due to kernel upgrade.
* 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 14:58 jynus: cloning persercache contents from pc1003 to pc1006
* 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
* 14:45 elukey: rebooting analytics 1036 to 1039 for kernel upgrade
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
* 14:35 elukey: analytics 1035 hasn't been rebooted because it is a Hadoop Journal Node (will be restarted in the end)
* 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 14:04 elukey: rebooting analytics 1032 to 1035 for kernel upgrades
* 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 14:03 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1003 for cloning to pc1006 (duration: 02m 30s)
* 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
* 13:59 jynus: about to going new hardware/OS/mariadb-only for parsercache service
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:32 elukey: rebooting analytics1030/1031 for kernel upgrade
* 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
* 13:15 akosiaris: rebooting fermium for kernel upgrades
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:10 elukey: rebooting analytics1029 for kernel upgrade
* 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]])
* 12:29 moritzm: rebooting analytics1028 for kernel update
* 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 10:25 ema: restarting apache2 and hhvm on mw1119
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
* 03:19 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: Correct invalid cirrus shard configuration (duration: 02m 59s)
* 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
* 02:55 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 27 02:55:21 UTC 2016 (duration 7m 13s)
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
* 02:48 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 25s)
* 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 51s)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:59 logmsgbot: ori@mira Synchronized docroot and w: Icc4f6134b0: Add a speed experiment which inlines the top stylesheet (duration: 02m 28s)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 01:29 MaxSem: on terbium: ran mwscript namespaceDupes.php --wiki=wuuwiki --source-pseudo-namespace='' --add-suffix=/renamed --fix
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 01:26 MaxSem: Fail, trying something else...
* 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 01:21 MaxSem: running mwscript namespaceDupes.php  --wiki=wuuwiki --move-talk --fix
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 00:52 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266497/ (duration: 02m 26s)
* 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 00:48 logmsgbot: krenair@mira Synchronized w/static/images/project-logos/ukwikinews.png: https://gerrit.wikimedia.org/r/#/c/266497/ (duration: 02m 29s)
* 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 00:44 logmsgbot: krenair@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266161/ (duration: 02m 27s)
* 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 00:15 logmsgbot: ebernhardson@mira Synchronized php-1.27.0-wmf.11/extensions/CirrusSearch/: Allow pointing morelike queries at a specific datacenter (duration: 03m 04s)
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 00:10 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-production.php: point morelike queries back at the eqiad cluster (duration: 05m 41s)
* 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
* 00:02 chasemp: enable puppet and codify the 192 thread count for nfsd
* 10:07 moritzm: installing ncurses security updates
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]]
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
* 07:11 moritzm: installing emacs security updates
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json


== 2016-01-26 ==
== 2023-05-18 ==
* 22:25 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11, for real this time
* 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 22:17 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11
* 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 22:15 logmsgbot: dduvall@mira Synchronized php-1.27.0-wmf.11: syncing wmf.11 backports of session fixes (duration: 03m 55s)
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
* 21:55 logmsgbot: ori@mira Synchronized docroot and w: I9b054d847a: New set of speed experiments (duration: 01m 29s)
* 22:20 mutante: short down-time for zuul-merger on contint2001
* 21:41 marxarelli: filed https://phabricator.wikimedia.org/T124828 for fatal in extensions/Echo
* 21:47 mutante: maintenance for zuul (CI) on contint servers
* 21:22 marxarelli: Fatal error: Cannot redeclare class CallbackFilterIterator in /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php on line 24
* 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 21:21 marxarelli: lint error found when running sync-dir 'Errors parsing /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php'
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s)
* 21:11 marxarelli: sync-dir php linting failed
* 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:02 marxarelli: resuming sync-dir and ignoring error as a known issue
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]]
* 20:59 marxarelli: getting 'Lost parent, LightProcess exiting' when running sync-dir
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s)
* 20:57 chasemp: drop labstore1001 nfs threads down to 192
* 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:46 chasemp: stopping nfs on labstore1001
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]]
* 20:46 marxarelli: modified wikiversions.php locally on mw1017 to promote all wikis to wmf.11 for initial testing
* 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:18 marxarelli: locally modified wikiversions.php and wikiversions.json on mw1017 for testing
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:14 marxarelli: running 'sync-common --verbose deployment.eqiad.wmnet' on mw1017 to sync wmf.11 for initial testing
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s)
* 20:02 marxarelli: proceeding with train deploy. wmf.11 to mw1017, then group0
* 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 19:46 akosiaris: issuing a varnish ban on all esams mobile frontend varnish for req.http.host .*wikimedia.org
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]]
* 19:45 akosiaris: issuing a varnish ban on all esams mobile backend varnish for req.http.host .*wikimedia.org
* 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s)
* 19:44 akosiaris: issuing a varnish ban on all ulsfo mobile frontend varnish for req.http.host .*wikimedia.org
* 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]]
* 19:44 akosiaris: issuing a varnish ban on all ulsfo mobile backend varnish for req.http.host .*wikimedia.org
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 19:43 akosiaris: issuing a varnish ban on all codfw mobile frontend varnish for req.http.host .*wikimedia.org
* 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8  refs [[phab:T330215|T330215]]
* 19:36 akosiaris: issuing a varnish ban on all codfw mobile backend varnish for req.http.host .*wikimedia.org
* 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 19:36 akosiaris: issuing a varnish ban on all eqiad mobile frontend varnish for req.http.host .*wikimedia.org
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:36 akosiaris: issuing a varnish ban on all eqiad mobile backend varnish for req.http.host .*wikimedia.org
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 19:36 akosiaris: all of the above referred to cache_text
* 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 19:29 akosiaris: all of the above already done, back logging
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 19:29 akosiaris: issuing a varnish ban on all esams frontend varnish for req.http.host .*wikimedia.org
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:29 akosiaris: issuing a varnish ban on all esams backend varnish for req.http.host .*wikimedia.org
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 19:29 akosiaris: issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
* 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 19:29 akosiaris: issuing a varnish ban on all ulsfo frontend varnish for req.http.host .*wikimedia.org
* 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 19:28 akosiaris: issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
* 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 19:28 akosiaris: issuing a varnish ban on all codfw frontend varnish for req.http.host .*wikimedia.org
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 19:28 akosiaris: issuing a varnish ban on all codfw backend varnish for req.http.host .*wikimedia.org
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 19:28 akosiaris: issuing a varnish ban on all eqiad frontend varnish for req.http.host .*wikimedia.org
* 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 akosiaris: issuing a varnish ban on all eqiad backend varnish for req.http.host .*wikimedia.org
* 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 19:02 marxarelli: backports to wmf.11 ready on mira but delaying train due to wikimedia.org outage
* 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:44 _joe_: running salt --batch-size=20 -C 'G@luster:appserver and G@site:eqiad' cmd.run 'puppet agent -t --tags mw-apache-config'
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:27 robh: i broke icinga, but then i fixed it, icinga back to normal.
* 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:21 robh: icinga is broken, it seems it was from a change before mine, but my forced reload broke it
* 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:18 legoktm: running mwscript updateArticleCount.php --wiki=jawiki --update=1
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:14 cmjohnson1: starting puppet on mw cluster
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:14 robh: i broke icinga, fixing
* 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:08 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Pool new parsercache pc1005 after cloning it from pc1002 (duration: 01m 28s)
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:43 thcipriani: ltwiki collation updated 503623 rows processed
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:35 mutante: mw1258 - restart hhvm
* 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:20 cmjohnson: disabling puppet on mw cluster
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:02 thcipriani: running updateCollation on ltwiki
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 17:01 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set category collation to uca-lt on lt.wikipedia [[gerrit:266427]] (duration: 01m 33s)
* 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]]
* 16:55 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on ur.wikipedia [[gerrit:265888]] (duration: 07m 10s)
* 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:36 logmsgbot: thcipriani@mira Synchronized w/static/images/project-logos/etwikiquote.png: SWAT: Update et.wikiquote logo [[gerrit:265623]] (duration: 01m 27s)
* 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:31 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink on nl.wikiquote [[gerrit:265666]] (duration: 01m 26s)
* 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
* 16:26 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespaces configuration on sk.wikipedia [[gerrit:265896]] (duration: 01m 27s)
* 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:19 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove Tranwiki namespace on wuu.wikipedia [[gerrit:265892]] and Add Portal namespace on wuu.wikipedia [[gerrit:265893]] (duration: 01m 27s)
* 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:12 logmsgbot: thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration for wuu.wikipedia [[gerrit:265891]] (duration: 01m 29s)
* 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]]
* 14:57 ema: Finished migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
* 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 14:48 chasemp: RPS on eth0 on labstores
* 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 14:39 bblack: upgrading packages (incl kernel) on all ulsfo caches (cp4xxx)
* 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
* 14:21 akosiaris: migrating alsafi,mx2001 back to 2004 for testing
* 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
* 14:14 akosiaris: migrate alsafi,mx2001 back from ganeti2004 to fix a network misconfiguration
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 13:32 moritzm: rebooted nescio/maerlant for kernel update
* 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
* 13:14 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1002 for maintenance (clone to pc1005) (duration: 01m 39s)
* 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 12:39 akosiaris: rolling reboot of ganeti200{1,2,3,4,5,6}.codfw.wmnet for kernel upgrade
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 12:10 moritzm: rebooting mx2001/mx1001 (with a delay in between) for kernel update
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 11:50 moritzm: rebooting etherpad1001 for kernel update
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 11:46 moritzm: rebooting bromine for kernel update
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
* 10:50 ema: Starting migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
* 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:28 _joe_: finishing reboots of appservers in eqiad
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 04:27 legoktm: restarted resetGlobalUserTokens.php after it lost mysql connection again
* 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 26 02:30:58 UTC 2016 (duration 7m 0s)
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 36s)
* 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 01:45 logmsgbot: krenair@mira Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/266453/ (duration: 01m 27s)
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
* 00:45 mobrovac: mobileapps deploying c2318b6
* 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
* 00:40 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 25s)
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
* 00:37 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT bd808 (duration: 01m 34s)
* 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
* 00:32 logmsgbot: ebernhardson@mira Synchronized portals/: SWAT jgirault (duration: 01m 28s)
* 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
* 00:29 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT ebernhardson (duration: 01m 26s)
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
* 00:27 logmsgbot: ebernhardson@mira Synchronized wmf-config/CirrusSearch-common.php: SWAT ebernhardson (duration: 01m 26s)
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 00:25 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT ebernhardson (duration: 01m 27s)
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 00:15 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT AaronSchulz (duration: 01m 26s)
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 00:13 logmsgbot: ebernhardson@mira Synchronized wmf-config/filebackend-production.php: SWAT AaronSchulz (duration: 01m 26s)
* 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 00:10 logmsgbot: ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT James_F (duration: 01m 26s)
* 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 00:08 logmsgbot: ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT James_F (duration: 01m 35s)
* 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:18 TheresNoTime: closing backport window
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s)
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]]
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s)
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s)
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
* 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
* 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
* 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
* 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
* 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
* 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
* 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
* 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers ([[phab:T320508|T320508]])
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 ([[phab:T336483|T336483]])
* 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
* 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
* 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
* 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
* 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
* 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
* 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
* 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2
* 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
* 07:25 apergos: UTC morning backport and config training window done
* 07:15 kartik@deploy1002: Finished scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 18s)
* 07:07 kartik@deploy1002: kartik: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:06 kartik@deploy1002: Started scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl [[phab:T336833|T336833]]', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json
* 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
* 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance


== 2016-01-25 ==
== 2023-05-17 ==
* 23:14 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/parser/: live hacks, now committed (duration: 01m 27s)
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:07 logmsgbot: legoktm@mira Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/266410/ (duration: 01m 35s)
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
* 22:52 logmsgbot: ori@mira Synchronized php-1.27.0-wmf.10/includes/parser/ParserOutput.php: Fix-up for ParserOutput.php@263 debug logging (duration: 01m 27s)
* 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
* 22:30 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/parser/: https://gerrit.wikimedia.org/r/#/c/266401/ + https://gerrit.wikimedia.org/r/#/c/266406/ + live hacks (duration: 01m 28s)
* 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 22:28 logmsgbot: legoktm@mira Synchronized php-1.27.0-wmf.10/includes/content/WikitextContent.php: https://gerrit.wikimedia.org/r/#/c/266401/ (duration: 01m 29s)
* 22:15 krinkle@deploy1002: Synchronized wmf-config/: [[phab:T332012|T332012]] (duration: 06m 51s)
* 21:53 logmsgbot: hoo@mira Synchronized wmf-config/Wikibase-production.php: Disable (not yet deployed) commons category sidebar link overwrite in production (duration: 01m 28s)
* 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
* 21:47 mutante: nitrogen - shutdown -h now ....
* 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 21:45 mutante: alsafi - was reported down in icinga , is ganeti VM - fixed by just logging in as if it went to hibernate
* 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 21:37 mdholloway: mobileapps deployed 9252a22
* 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request [[:phab:T333842{{!}}T333842]]"
* 21:30 mutante: nitrogen - stop puppet, stop salt, remove from stored configs / icinga
* 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
* 20:19 logmsgbot: hoo@mira Synchronized wmf-config/Wikibase-labs.php: (no message) (duration: 01m 28s)
* 20:32 urbanecm: UTC late B&C window done
* 20:14 chasemp: bump labstore nfs threads to 288 from 244
* 20:29 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] (duration: 11m 36s)
* 19:32 paravoid: eqiad: removing static routes for 6to4/Teredo to nitrogen (decommissioning our own relays)
* 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.
* 19:10 bd808: Live hacking on mw1017 to debug 1.27.0-wmf.11 issues. All wikis there currently set to use 1.27.0-wmf.11.
* 20:17 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]]
* 19:05 chasemp: labstore1001 temp change to CFQ scheduler on 01/22/2015
* 20:15 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] (duration: 12m 06s)
* 19:04 chasemp: the nfsd thread change is on labstore1001
* 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
* 19:04 chasemp: nfsd has 224 threads atm and was bumped up over the weekend
* 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
* 18:58 ori: removed unused wikiversions.cdb on mira and tin
* 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
* 18:28 jynus: retroactively logging the depool of mw1217, mw1178 and mw1257 3 hours ago (Jan 25 15:45:26)
* 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:49 ema: Finished migration of mobile traffic to text cluster in ulsfo https://phabricator.wikimedia.org/T109286
* 20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]]
* 16:38 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (third try) (duration: 01m 35s)
* 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:26 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (second try after running puppet) (duration: 03m 23s)
* 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:25 _joe_: restarting salt-minion on all deployment targets
* 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
* 16:24 _joe_: running salt deploy.fixurl on all deployment targets
* 19:50 ejegg: payments-wiki upgraded from {{Gerrit|8988a598}} to {{Gerrit|a7567c6a}}
* 16:09 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Preparing ips for new parsercache deployments (duration: 03m 32s)
* 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update [[phab:T331297|T331297]]
* 15:51 ejegg: updated DjangoBannerStats from a64fe0e373a978d3df0b7f1dd74ac4cc5c78d34e to 71df14d4d8b11f3ca0ef1eeb6c6e2db9be79103a
* 19:01 Amir1: Removing db1112 from zarcillo [[phab:T336332|T336332]]
* 15:35 ema: Starting migration of mobile traffic to text cluster in ulsfo https://phabricator.wikimedia.org/T109286
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet
* 15:14 chasemp: restart of pdns and pdns-recursor on labservices1001
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:56 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: deploy new parsercache hardware (pc1004) substituting pc1001 (duration: 03m 25s)
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 13:16 elukey: ran kafka preferred-replica-election on kafka1022 to balance the leaders
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
* 13:07 elukey: restarting kafka on kafka1022
* 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 12:57 elukey: restarting kafka on kafka1013
* 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet
* 12:38 elukey: restarting kafka on kafka1014
* 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]] (duration: 06m 22s)
* 12:20 jynus: compressed and truncated iridium's phab daemons.log - it was taking 20% of disk space
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 12:04 ema: restarting kafka on kafka1018
* 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s)
* 11:26 jynus: stopping mysql at pc1001 and cloning to pc1004
* 18:03 brennen: train 1.41.0-wmf.9 ([[phab:T330215|T330215]]): no current blockers, rolling to group1 as backup-backup conductor
* 10:55 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1001 for maintenance (clone to pc1004) (duration: 01m 41s)
* 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795]
* 10:11 _joe_: switching the active deployment host to mira
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 ema: limiting GCLogFileSize and restarting kafka on kafka1012
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:31 _joe_: rolling reboot of the eqiad appserver cluster
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:27 moritzm: installed fuse security update on labnodepool1001 (the other fuse installations are on Ubuntu, which doesn't ship the udev rule, but uses mountall instead)
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:47 paravoid: stat1002: umount -f /mnt/hdfs
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
* 07:34 _joe_: rebooting alsafi, unresponsive to ssh
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
* 07:24 _joe_: restarting hhvm on mw1148, stuck in HPHP::Treadmill::startRequest (__lll_lock_wait)
* 17:19 brett: Maglev LVS scheduler rollout finished in esams - [[phab:T263797|T263797]]
* 07:23 _joe_: restarting hhvm on mw1143, stuck into HPHP::SynchronizableMulti::waitImpl (__pthread_cond_wait)
* 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for [[phab:T244570|T244570]]
* 03:10 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserCache.php: (no message) (duration: 00m 25s)
* 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 03:03 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserCache.php: (no message) (duration: 00m 25s)
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json
* 03:02 logmsgbot: tstarling@tin Synchronized php-1.27.0-wmf.10/includes/parser/ParserOutput.php: (no message) (duration: 00m 27s)
* 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 02:30 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 25 02:30:13 UTC 2016 (duration 6m 52s)
* 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json
* 02:23 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 09s)
* 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json
* 15:52 brett: Rolling out maglev LVS scheduler in esams - [[phab:T263797|T263797]]
* 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json
* 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json
* 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
* 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
* 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json
* 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
* 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
* 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json
* 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
* 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 06m 20s)
* 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
* 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
* 14:36 moritzm: installing jackson-databind security updates
* 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] (duration: 00m 09s)
* 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]]
* 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]]
* 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
* 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
* 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
* 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
* 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
* 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
* 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - [[phab:T336817|T336817]]
* 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - [[phab:T336726|T336726]]
* 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs
* 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 07m 30s)
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
* 13:59 taavi@deploy1002: Finished scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] (duration: 07m 24s)
* 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
* 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
* 13:54 taavi@deploy1002: matmarex and taavi: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:52 taavi@deploy1002: Started scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]]
* 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
* 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
* 13:47 taavi@deploy1002: Finished scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] (duration: 08m 11s)
* 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs
* 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
* 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
* 13:40 taavi@deploy1002: taavi and maurelio: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 taavi@deploy1002: Started scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]]
* 13:38 taavi@deploy1002: Finished scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] (duration: 07m 39s)
* 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
* 13:32 taavi@deploy1002: stang and taavi: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:30 taavi@deploy1002: Started scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]]
* 13:29 taavi@deploy1002: Finished scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] (duration: 09m 15s)
* 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
* 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs
* 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
* 13:20 taavi@deploy1002: Started scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]]
* 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:18 daniel@deploy1002: Finished scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] (duration: 11m 52s)
* 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs
* 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary
* 13:07 daniel@deploy1002: daniel: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:06 daniel@deploy1002: Started scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]]
* 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
* 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json
* 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
* 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
* 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
* 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json
* 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
* 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
* 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json
* 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json
* 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json
* 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json
* 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json
* 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json
* 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json
* 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
* 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
* 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json
* 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
* 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json
* 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
* 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
* 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
* 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json
* 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
* 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
* 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
* 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
* 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
* 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json
* 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json
* 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json
* 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
* 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json
* 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json
* 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json
* 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json
* 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
* 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
* 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - [[phab:T336726|T336726]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json
* 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json
* 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json
* 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 07:48 moritzm: upgrading krb1001 to Bullseye [[phab:T331695|T331695]]
* 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
* 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468
* 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json
* 07:19 kartik@deploy1002: Finished scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] (duration: 07m 22s)
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json
* 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json
* 07:11 kartik@deploy1002: Started scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]]
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json
* 07:09 kartik@deploy1002: Backport cancelled.
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json
* 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
* 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
* 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
* 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json
* 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json
* 06:01 volans: restarted ferm on ms-be1047
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json
* 05:16 marostegui: Optimize s7 on dbstore1003 [[phab:T336733|T336733]]
* 00:21 krinkle@deploy1002: Synchronized src/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 01s)
* 00:15 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 14s)
* 00:07 krinkle@deploy1002: Synchronized lib/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 51s)


== 2016-01-24 ==
== 2023-05-16 ==
* 02:31 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 24 02:31:21 UTC 2016 (duration 6m 58s)
* 20:59 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] (duration: 07m 18s)
* 02:24 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 11s)
* 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:52 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]]
* 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 20:49 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] (duration: 09m 19s)
* 20:41 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:39 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]]
* 20:36 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] (duration: 07m 44s)
* 20:30 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - [[phab:T263797|T263797]]
* 20:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]]
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox
* 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 19:04 sukhe: dummry run of authdns-update to confirm new hosts
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
* 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
* 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
* 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
* 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
* 18:47 ryankemper: [WDQS] Pooled `wdqs2012`
* 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled)
* 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": [[phab:T335777|T335777]]
* 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
* 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: [[phab:T326688|T326688]]
* 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": [[phab:T326688|T326688]]
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
* 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
* 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
* 18:01 sukhe: enable puppet on A:cp-text
* 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
* 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
* 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
* 17:40 moritzm: installing avahi security updates on buster
* 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
* 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s)
* 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937]
* 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
* 17:27 brett: Rolling out maglev LVS scheduler in drmrs - [[phab:T263797|T263797]]
* 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
* 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
* 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org
* 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" [[phab:T335777|T335777]]
* 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts
* 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
* 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help )
* 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container
* 15:49 sukhe: run authdns-update for CR 920314
* 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s)
* 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd]
* 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now [[phab:T336775|T336775]]
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: [[phab:T326688|T326688]]
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]
* 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
* 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
* 15:27 hashar: Restarting CI Jenkins
* 15:26 Emperor: rebalance codfw swift rings [[phab:T335280|T335280]]
* 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/
* 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:49 moritzm: installing libxml2 security updates on buster
* 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]]
* 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:43 hashar: Restarting CI Jenkins
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]]
* 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
* 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s)
* 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
* 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]]
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
* 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]]
* 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad
* 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 13:46 Emperor: repool ms-fe2012 [[phab:T335042|T335042]]
* 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web
* 13:32 taavi@deploy1002: Finished scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] (duration: 09m 08s)
* 13:32 Emperor: repool thanos-fe2003 [[phab:T335042|T335042]]
* 13:30 sukhe: running authdns-update to repool codfw
* 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
* 13:25 taavi@deploy1002: mazevedo and taavi: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]]
* 13:23 taavi@deploy1002: Started scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]]
* 13:01 XioNoX: asw-d-codfw> request system reboot all-members - [[phab:T335042|T335042]]
* 12:52 Emperor: depool ms-fe2012 [[phab:T335042|T335042]]
* 12:51 Emperor: depool thanos-fe2003 [[phab:T335042|T335042]]
* 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]]
* 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade
* 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade
* 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
* 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - [[phab:T336740|T336740]]
* 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s)
* 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2]
* 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s)
* 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
* 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]]
* 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]]
* 12:21 XioNoX: disable ping offload in codfw - [[phab:T335042|T335042]]
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
* 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s)
* 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
* 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 11:59 kart_: Updated cxserver to 2023-05-16-061239-production ([[phab:T336657|T336657]])
* 11:57 XioNoX: stage upgrade on asw-d-codfw - [[phab:T335042|T335042]]
* 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s)
* 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw
* 11:50 marostegui: install 10.4.29 on db1151 [[phab:T336462|T336462]]
* 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw
* 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2]
* 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm
* 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance
* 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance
* 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
* 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm
* 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm
* 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm
* 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 11:00 moritzm: updated bookworm image to RC3 [[phab:T330495|T330495]]
* 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
* 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]]
* 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet
* 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
* 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
* 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - [[phab:T317799|T317799]]
* 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]]
* 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 10:13 Amir1: cleaning up echo notification table in all wikis ([[phab:T318523|T318523]])
* 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s)
* 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided)
* 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner
* 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts
* 09:21 marostegui: Optimize s5 on dbstore1003 [[phab:T336733|T336733]]
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
* 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
* 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
* 07:28 Emperor: restart vopsbot.service on alert1001
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
* 06:56 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] (duration: 06m 58s)
* 06:51 marostegui@deploy1002: marostegui: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 06:50 eileen: civicrm: revision {{Gerrit|d97a371e}}, config {{Gerrit|686d3cb4}}
* 06:49 marostegui@deploy1002: Started scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]]
* 06:49 _joe_: running docker image prune -a in build2001
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json
* 06:24 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] (duration: 07m 08s)
* 06:24 eileen: civicrm upgraded from {{Gerrit|ef7b3822}} to {{Gerrit|d97a371e}}
* 06:18 marostegui@deploy1002: marostegui: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 06:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]]
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json
* 06:05 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] (duration: 07m 21s)
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json
* 05:59 marostegui@deploy1002: marostegui: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json
* 05:58 marostegui@deploy1002: Started scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]]
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json
* 05:43 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] (duration: 07m 15s)
* 05:38 marostegui@deploy1002: marostegui: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 05:36 marostegui@deploy1002: Started scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]] (duration: 48m 47s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]


== 2016-01-23 ==
== 2023-05-15 ==
* 19:03 logmsgbot: ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: config change to repoint morelike search from eqiad to codfw (duration: 00m 26s)
* 23:37 eileen: civicrm upgraded from {{Gerrit|db6e8d69}} to {{Gerrit|ef7b3822}}
* 19:02 logmsgbot: ebernhardson@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/: Support code for repointing morelike queries from eqiad to codfw (duration: 00m 30s)
* 22:02 maryum: deployed patch for [[phab:T323651|T323651]]
* 19:00 ebernhardson: repoint most expensive search queries (morelike) at codfw cluster to reduce load. 1/2 of eqiad cluster maxed on cpu
* 21:51 maryum: Deployed patch for [[phab:T335612|T335612]]
* 16:47 Krinkle: mwscript deleteEqualMessages.php --wiki wowiki
* 21:42 ejegg: payments-wiki upgraded from {{Gerrit|c0da741f}} to {{Gerrit|8988a598}} (and globalcollect settings deleted)
* 13:25 jynus: upgrading and restarting db1046
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:13 jynus: db1046 maintenance finished- restarting mysql to apply latest configuration
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:32 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 23 02:32:15 UTC 2016 (duration 7m 3s)
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 02:25 logmsgbot: mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 09s)
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 01:33 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Back to 1.27.0-wmf10 again after fixking l10n cache problems
* 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 01:28 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Temporarily back to 1.27.0-wmf11; need to rebuild l10n cache
* 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 01:16 logmsgbot: bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10
* 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 00:08 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265872/ (duration: 00m 25s)
* 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 00:07 logmsgbot: bd808@tin Synchronized php-1.27.0-wmf.11/includes/session/CookieSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265871/ (duration: 00m 25s)
* 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]]
* 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance
* 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance
* 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s)
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS
* 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
* 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
* 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s)
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
* 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s)
* 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet
* 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124
* 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - [[phab:T336434|T336434]]
* 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] [[phab:T335042|T335042]]
* 18:33 brett: Rolling out maglev LVS scheduler in eqsin - [[phab:T263797|T263797]]
* 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
* 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
* 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
* 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
* 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
* 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet