You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(RoanKattouw: Ran namespaceDupes.php on tiwiki and tiwiktionary for T251287)
imported>Stashbot
(eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster)
 
(845 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-05-20 ==
== 2022-12-09 ==
* 00:05 RoanKattouw: Ran namespaceDupes.php on tiwiki and tiwiktionary for [[phab:T251287|T251287]]
* 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster
* 00:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set sitename and meta namespace localizations for tiwiki and tiwiktionary ([[phab:T251287|T251287]]) (duration: 01m 06s)
* 01:07 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2003
* 01:06 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2003
* 01:06 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2002
* 01:05 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2002
* 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
* 01:04 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
* 01:01 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2003.codfw.wmnet
* 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:45 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:43 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:39 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2003.codfw.wmnet
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2002.codfw.wmnet
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:37 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:34 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:30 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2002.codfw.wmnet


== 2020-05-19 ==
== 2022-12-08 ==
* 23:59 RoanKattouw: Ran namespaceDupes.php on jvwiki and jvwiktionary for [[phab:T252754|T252754]]
* 23:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]] (duration: 08m 42s)
* 23:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Insider/includes/InsiderHooks.php: [[phab:T252846|T252846]] Use SidebarBeforeOutput hook with correct format (duration: 01m 06s)
* 23:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 23:55 catrope@deploy1001: Finished scap: i18n scap for namespace localizations ([[phab:T251287|T251287]], [[phab:T252754|T252754]]) (duration: 62m 26s)
* 23:23 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]]
* 22:53 catrope@deploy1001: Started scap: i18n scap for namespace localizations ([[phab:T251287|T251287]], [[phab:T252754|T252754]])
* 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS buster
* 18:46 herron: performing rolling restarts of codfw/eqiad ELK clusters for java updates
* 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
* 18:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant template editors editcontentmodel on enwiki ([[phab:T253081|T253081]]) (duration: 01m 06s)
* 23:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
* 18:35 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments features on frwiki ([[phab:T252420|T252420]]) (duration: 01m 08s)
* 23:11 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]] (duration: 09m 12s)
* 17:09 arturo: added tesseract suite to stretch-wikimedia component/tesseract-410-bpo ([[phab:T247422|T247422]])
* 23:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:24 godog: power cycle thanos-fe* / thanos-be*
* 23:02 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]]
* 15:23 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2073 into s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11236 and previous config saved to /var/cache/conftool/dbconfig/20200519-152340-kormat.json
* 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
* 15:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
* 15:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS buster
* 15:16 cdanis: canary on ~150 hosts looks great, re-enabling puppet on all physical hosts ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo cumin 'F:virtual = physical'  'enable-puppet "cdanis deploying I68c97d5"'
* 22:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 15:04 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:28 TheresNoTime: close UTC late backport and config training (+28m)
* 15:04 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 22:27 samtar@deploy1002: Finished scap: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]] (duration: 09m 57s)
* 14:59 moritzm: installing fuse update from Buster point release
* 22:19 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:47 cdanis: disabling puppet on all physical hosts ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo cumin 'F:virtual = physical'  'disable-puppet "cdanis deploying I68c97d5"'
* 22:17 samtar@deploy1002: Started scap: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]]
* 14:38 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:16 samtar@deploy1002: Finished scap: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]] (duration: 10m 06s)
* 14:26 XioNoX: Set minimum-links 2 to AMS-IX LACP - [[phab:T253122|T253122]]
* 22:08 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:53 XioNoX: configure new AMS-IX port as quarantine - [[phab:T251121|T251121]]
* 22:06 samtar@deploy1002: Started scap: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]]
* 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 22:05 samtar@deploy1002: Finished scap: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]] (duration: 11m 04s)
* 13:09 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 13:09 jayme: updated helm: 2.16.7-1 -> 2.16.7-2 on deploy[1,2]001 and contint[1,2]001
* 21:56 samtar@deploy1002: samtar and stang: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:09 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:54 samtar@deploy1002: Started scap: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]]
* 13:03 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2136 into s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11233 and previous config saved to /var/cache/conftool/dbconfig/20200519-130313-kormat.json
* 21:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]] (duration: 09m 16s)
* 12:40 ariel@deploy1001: Finished deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good (duration: 00m 04s)
* 21:46 samtar@deploy1002: samtar and stang: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:40 ariel@deploy1001: Started deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good
* 21:44 samtar@deploy1002: Started scap: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]]
* 12:37 jayme: imported helm 2.16.7-2 to main for buster-wikimedia, stretch-wikimedia, jessie-wikimedia
* 21:39 TheresNoTime: [[phab:T324348|T324348]] : `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php specieswiki geodata`
* 12:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 21:37 samtar@deploy1002: Finished scap: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]] (duration: 08m 01s)
* 11:51 jynus: starting backups of es1, es2, es3 on eqiad into backup1002
* 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 11:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019', diff saved to https://phabricator.wikimedia.org/P11232 and previous config saved to /var/cache/conftool/dbconfig/20200519-114148-jynus.json
* 21:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 11:12 marostegui: Deploy schema change on db2124 (frwiki, jawiki, ruwiki) [[phab:T238966|T238966]]
* 21:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 10:34 mutante: releases2001 - restarted failed jenkins
* 21:31 samtar@deploy1002: samtar and stang: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:33 mutante: releases2001 - Failed to restart jenkins.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
* 21:29 samtar@deploy1002: Started scap: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]]
* 10:32 volans: flushed all Netbox caches (manage.py invalidate all) - [[phab:T253091|T253091]]
* 21:27 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]] (duration: 08m 31s)
* 10:29 volans: start Netbox restore - [[phab:T253091|T253091]]
* 21:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 10:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 21:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 10:13 akosiaris: upgrade etherpad-lite to 1.8.4 on etherpad1002
* 21:21 jdrewniak@deploy1002: jdrewniak and jdrewniak: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:58 hnowlan: roll-restart of eqiad restbase hosts for java security updates
* 21:19 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]]
* 09:58 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 21:17 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:866501{{!}} Bumping portals to master (T128546)]] (duration: 06m 55s)
* 09:55 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 21:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 21:10 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:866501{{!}} Bumping portals to master (T128546)]] (duration: 07m 07s)
* 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 20:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 09:10 godog: eqiad-prod: decom ms-be101[678] - [[phab:T252008|T252008]]
* 20:34 ryankemper: [Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005
* 08:07 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqsin
* 20:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 08:04 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - esams
* 20:27 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 08:01 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqiad
* 20:27 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 07:55 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide:  (duration: 00m 06s)
* 20:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 07:54 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
* 20:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for [[phab:T322776|T322776]]
* 07:52 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - *dfw
* 20:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for [[phab:T322776|T322776]]
* 07:49 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - ulsfo
* 20:17 ryankemper: [[phab:T323064|T323064]] Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1
* 07:45 vgutierrez: rolling upgrade to trafficserver 8.0.7-1wm10 with puppet disabled on cp hosts
* 20:12 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 07:09 jynus: starting es4 & es5 eqiad backups with low concurrency
* 20:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 06:35 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 19:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 06:29 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 19:59 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 06:24 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 06:17 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 16:14 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001
* 05:57 volker-e@deploy1001: Finished deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide:  (duration: 00m 06s)
* 16:14 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001
* 05:57 volker-e@deploy1001: Started deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide:
* 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only=off for maintenance [[phab:T251981|T251981]]', diff saved to https://phabricator.wikimedia.org/P11227 and previous config saved to /var/cache/conftool/dbconfig/20200519-050346-marostegui.json
* 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only for maintenance [[phab:T251981|T251981]]', diff saved to https://phabricator.wikimedia.org/P11226 and previous config saved to /var/cache/conftool/dbconfig/20200519-050043-marostegui.json
* 16:12 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
* 04:27 marostegui: Repool labsdb1011 [[phab:T249188|T249188]]
* 16:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 03:29 volker-e@deploy1001: Finished deploy [design/style-guide@4b4bc51]: Deploy design/style-guide: (duration: 00m 07s)
* 16:08 eevans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 03:28 volker-e@deploy1001: Started deploy [design/style-guide@4b4bc51]: Deploy design/style-guide:
* 16:08 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye
* 15:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
* 15:48 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
* 15:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
* 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
* 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 15:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
* 15:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
* 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye
* 15:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 15:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 15:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 15:12 hashar: Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # [[phab:T323754|T323754]]
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet
* 15:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 15:09 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 15:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 15:07 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 15:05 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2001.codfw.wmnet
* 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42650 and previous config saved to /var/cache/conftool/dbconfig/20221208-150109-ladsgroup.json
* 14:59 hashar: Restarting Gerrit replica TWICE on gerrit2002.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # [[phab:T323754|T323754]]
* 14:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:50 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:50 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:47 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42649 and previous config saved to /var/cache/conftool/dbconfig/20221208-144602-ladsgroup.json
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42648 and previous config saved to /var/cache/conftool/dbconfig/20221208-144152-ladsgroup.json
* 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42647 and previous config saved to /var/cache/conftool/dbconfig/20221208-144131-ladsgroup.json
* 14:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:40 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42646 and previous config saved to /var/cache/conftool/dbconfig/20221208-142625-ladsgroup.json
* 14:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:20 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:20 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:19 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:19 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 14:13 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42645 and previous config saved to /var/cache/conftool/dbconfig/20221208-141118-ladsgroup.json
* 14:10 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 14:09 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 14:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 14:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42644 and previous config saved to /var/cache/conftool/dbconfig/20221208-135611-ladsgroup.json
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42643 and previous config saved to /var/cache/conftool/dbconfig/20221208-135402-ladsgroup.json
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42642 and previous config saved to /var/cache/conftool/dbconfig/20221208-135341-ladsgroup.json
* 13:43 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42641 and previous config saved to /var/cache/conftool/dbconfig/20221208-133835-ladsgroup.json
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42640 and previous config saved to /var/cache/conftool/dbconfig/20221208-132329-ladsgroup.json
* 13:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142] (duration: 00m 15s)
* 13:20 aqu@deploy1002: Started deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142]
* 13:19 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142] (duration: 00m 09s)
* 13:19 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142]
* 13:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42639 and previous config saved to /var/cache/conftool/dbconfig/20221208-130822-ladsgroup.json
* 13:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42638 and previous config saved to /var/cache/conftool/dbconfig/20221208-130612-ladsgroup.json
* 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42637 and previous config saved to /var/cache/conftool/dbconfig/20221208-130551-ladsgroup.json
* 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42635 and previous config saved to /var/cache/conftool/dbconfig/20221208-125045-ladsgroup.json
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
* 12:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42634 and previous config saved to /var/cache/conftool/dbconfig/20221208-124435-ladsgroup.json
* 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42633 and previous config saved to /var/cache/conftool/dbconfig/20221208-123538-ladsgroup.json
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42632 and previous config saved to /var/cache/conftool/dbconfig/20221208-122928-ladsgroup.json
* 12:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 12:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42631 and previous config saved to /var/cache/conftool/dbconfig/20221208-122032-ladsgroup.json
* 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42630 and previous config saved to /var/cache/conftool/dbconfig/20221208-121823-ladsgroup.json
* 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42629 and previous config saved to /var/cache/conftool/dbconfig/20221208-121801-ladsgroup.json
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42628 and previous config saved to /var/cache/conftool/dbconfig/20221208-121422-ladsgroup.json
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42627 and previous config saved to /var/cache/conftool/dbconfig/20221208-120255-ladsgroup.json
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42626 and previous config saved to /var/cache/conftool/dbconfig/20221208-115915-ladsgroup.json
* 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42625 and previous config saved to /var/cache/conftool/dbconfig/20221208-115659-ladsgroup.json
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42624 and previous config saved to /var/cache/conftool/dbconfig/20221208-115627-ladsgroup.json
* 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42623 and previous config saved to /var/cache/conftool/dbconfig/20221208-114748-ladsgroup.json
* 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42622 and previous config saved to /var/cache/conftool/dbconfig/20221208-114120-ladsgroup.json
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42621 and previous config saved to /var/cache/conftool/dbconfig/20221208-113240-ladsgroup.json
* 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42620 and previous config saved to /var/cache/conftool/dbconfig/20221208-113030-ladsgroup.json
* 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42619 and previous config saved to /var/cache/conftool/dbconfig/20221208-112951-ladsgroup.json
* 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42618 and previous config saved to /var/cache/conftool/dbconfig/20221208-112612-ladsgroup.json
* 11:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267] (duration: 00m 18s)
* 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267]
* 11:21 moritzm: drain ganeti5002 for eventual decom [[phab:T324610|T324610]]
* 11:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267] (duration: 00m 09s)
* 11:20 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267]
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42617 and previous config saved to /var/cache/conftool/dbconfig/20221208-111444-ladsgroup.json
* 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42616 and previous config saved to /var/cache/conftool/dbconfig/20221208-111105-ladsgroup.json
* 11:10 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 11:09 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42615 and previous config saved to /var/cache/conftool/dbconfig/20221208-110849-ladsgroup.json
* 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42614 and previous config saved to /var/cache/conftool/dbconfig/20221208-110828-ladsgroup.json
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42613 and previous config saved to /var/cache/conftool/dbconfig/20221208-105938-ladsgroup.json
* 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42612 and previous config saved to /var/cache/conftool/dbconfig/20221208-105321-ladsgroup.json
* 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42611 and previous config saved to /var/cache/conftool/dbconfig/20221208-104432-ladsgroup.json
* 10:43 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42610 and previous config saved to /var/cache/conftool/dbconfig/20221208-104322-ladsgroup.json
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42609 and previous config saved to /var/cache/conftool/dbconfig/20221208-104300-ladsgroup.json
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42608 and previous config saved to /var/cache/conftool/dbconfig/20221208-103815-ladsgroup.json
* 10:36 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]] (duration: 09m 17s)
* 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
* 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
* 10:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42606 and previous config saved to /var/cache/conftool/dbconfig/20221208-102754-ladsgroup.json
* 10:26 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]]
* 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42605 and previous config saved to /var/cache/conftool/dbconfig/20221208-102308-ladsgroup.json
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42604 and previous config saved to /var/cache/conftool/dbconfig/20221208-102052-ladsgroup.json
* 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42603 and previous config saved to /var/cache/conftool/dbconfig/20221208-102030-ladsgroup.json
* 10:18 hashar: contint1002: activated Icinga monitoring , all services are up and running # [[phab:T313832|T313832]]
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42602 and previous config saved to /var/cache/conftool/dbconfig/20221208-101247-ladsgroup.json
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42600 and previous config saved to /var/cache/conftool/dbconfig/20221208-100524-ladsgroup.json
* 10:01 claime: Deploying puppet enforcement of zuul-merger on contint1002
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42599 and previous config saved to /var/cache/conftool/dbconfig/20221208-095741-ladsgroup.json
* 09:57 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host test-reimage2001.codfw.wmnet
* 09:56 steve_munene: restarting varnishkafka-webrequest.service on host cp1075 [[phab:T323771|T323771]]
* 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42598 and previous config saved to /var/cache/conftool/dbconfig/20221208-095017-ladsgroup.json
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) test-reimage2001.codfw.wmnet on all recursors
* 09:50 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache test-reimage2001.codfw.wmnet on all recursors
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
* 09:49 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
* 09:46 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:46 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host test-reimage2001.codfw.wmnet
* 09:43 hashar: contint1002: stopped puppet and manually  started zuul-merger.  I am monitoring it cause last time we have bring up a new one it had some issues here and there # [[phab:T313832|T313832]]
* 09:38 hashar: contint1001: manually stopped and masked zuul-merger. It is under maintenance mode in Icinga # [[phab:T313832|T313832]]
* 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42597 and previous config saved to /var/cache/conftool/dbconfig/20221208-093511-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42596 and previous config saved to /var/cache/conftool/dbconfig/20221208-093255-ladsgroup.json
* 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42595 and previous config saved to /var/cache/conftool/dbconfig/20221208-093218-ladsgroup.json
* 09:25 hashar@deploy1002: Finished deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # [[phab:T313832|T313832]] (duration: 00m 07s)
* 09:24 hashar@deploy1002: Started deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # [[phab:T313832|T313832]]
* 09:17 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # [[phab:T313832|T313832]] (duration: 00m 03s)
* 09:17 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # [[phab:T313832|T313832]]
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42594 and previous config saved to /var/cache/conftool/dbconfig/20221208-091712-ladsgroup.json
* 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42593 and previous config saved to /var/cache/conftool/dbconfig/20221208-090205-ladsgroup.json
* 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42592 and previous config saved to /var/cache/conftool/dbconfig/20221208-085724-ladsgroup.json
* 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42591 and previous config saved to /var/cache/conftool/dbconfig/20221208-085657-ladsgroup.json
* 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42590 and previous config saved to /var/cache/conftool/dbconfig/20221208-084659-ladsgroup.json
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42589 and previous config saved to /var/cache/conftool/dbconfig/20221208-084442-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42588 and previous config saved to /var/cache/conftool/dbconfig/20221208-084421-ladsgroup.json
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42587 and previous config saved to /var/cache/conftool/dbconfig/20221208-084151-ladsgroup.json
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42586 and previous config saved to /var/cache/conftool/dbconfig/20221208-082914-ladsgroup.json
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42585 and previous config saved to /var/cache/conftool/dbconfig/20221208-082644-ladsgroup.json
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42584 and previous config saved to /var/cache/conftool/dbconfig/20221208-081408-ladsgroup.json
* 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42583 and previous config saved to /var/cache/conftool/dbconfig/20221208-081138-ladsgroup.json
* 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42582 and previous config saved to /var/cache/conftool/dbconfig/20221208-075901-ladsgroup.json
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42581 and previous config saved to /var/cache/conftool/dbconfig/20221208-075645-ladsgroup.json
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42580 and previous config saved to /var/cache/conftool/dbconfig/20221208-075624-ladsgroup.json
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42579 and previous config saved to /var/cache/conftool/dbconfig/20221208-074117-ladsgroup.json
* 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42578 and previous config saved to /var/cache/conftool/dbconfig/20221208-073122-ladsgroup.json
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42577 and previous config saved to /var/cache/conftool/dbconfig/20221208-073101-ladsgroup.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42576 and previous config saved to /var/cache/conftool/dbconfig/20221208-072611-ladsgroup.json
* 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42575 and previous config saved to /var/cache/conftool/dbconfig/20221208-071554-ladsgroup.json
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42574 and previous config saved to /var/cache/conftool/dbconfig/20221208-071104-ladsgroup.json
* 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42573 and previous config saved to /var/cache/conftool/dbconfig/20221208-070847-ladsgroup.json
* 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42572 and previous config saved to /var/cache/conftool/dbconfig/20221208-070825-ladsgroup.json
* 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42571 and previous config saved to /var/cache/conftool/dbconfig/20221208-070048-ladsgroup.json
* 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 31800
* 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 31800
* 06:55 bblack: lvs1017: restarting pybal to take back text traffic (med reverted to normal, underlying problem w/ ipv6 addressed)
* 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42570 and previous config saved to /var/cache/conftool/dbconfig/20221208-065319-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42569 and previous config saved to /var/cache/conftool/dbconfig/20221208-064541-ladsgroup.json
* 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42568 and previous config saved to /var/cache/conftool/dbconfig/20221208-063813-ladsgroup.json
* 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42567 and previous config saved to /var/cache/conftool/dbconfig/20221208-062306-ladsgroup.json
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42566 and previous config saved to /var/cache/conftool/dbconfig/20221208-062050-ladsgroup.json
* 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42565 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42564 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
* 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42563 and previous config saved to /var/cache/conftool/dbconfig/20221208-062006-ladsgroup.json
* 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42562 and previous config saved to /var/cache/conftool/dbconfig/20221208-061436-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42561 and previous config saved to /var/cache/conftool/dbconfig/20221208-060551-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42560 and previous config saved to /var/cache/conftool/dbconfig/20221208-060522-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42559 and previous config saved to /var/cache/conftool/dbconfig/20221208-060500-ladsgroup.json
* 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42558 and previous config saved to /var/cache/conftool/dbconfig/20221208-055930-ladsgroup.json
* 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42557 and previous config saved to /var/cache/conftool/dbconfig/20221208-055046-ladsgroup.json
* 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42556 and previous config saved to /var/cache/conftool/dbconfig/20221208-055015-ladsgroup.json
* 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42555 and previous config saved to /var/cache/conftool/dbconfig/20221208-054953-ladsgroup.json
* 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42554 and previous config saved to /var/cache/conftool/dbconfig/20221208-054423-ladsgroup.json
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42553 and previous config saved to /var/cache/conftool/dbconfig/20221208-053541-ladsgroup.json
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42552 and previous config saved to /var/cache/conftool/dbconfig/20221208-053509-ladsgroup.json
* 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42551 and previous config saved to /var/cache/conftool/dbconfig/20221208-053447-ladsgroup.json
* 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42550 and previous config saved to /var/cache/conftool/dbconfig/20221208-053253-ladsgroup.json
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42549 and previous config saved to /var/cache/conftool/dbconfig/20221208-053236-ladsgroup.json
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 05:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 05:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42548 and previous config saved to /var/cache/conftool/dbconfig/20221208-052917-ladsgroup.json
* 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42547 and previous config saved to /var/cache/conftool/dbconfig/20221208-052705-ladsgroup.json
* 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42546 and previous config saved to /var/cache/conftool/dbconfig/20221208-052036-ladsgroup.json
* 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 03:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 02:24 bblack: lvs1017 - restary pybal manually again, back on bgp_med=101 (traffic goes back to lvs1020)
* 02:21 bblack: restarting pybal on lvs1017 manually again with bgp_med=0 (should take traffic, may or may not do so very usefully!)
* 02:05 bblack: sretest1001 - puppet disabled, manipulating routing on this host to conduct tests...
* 01:56 bblack: lvs1017 - manually setting BGP MED to 101 and starting pybal (should come back and and speak BGP, but not steal traffic from lvs1020)
* 01:29 bblack: lvs1017 - disable puppet and stop pybal to fix ipv6 for now
* 01:27 bblack: lvs1017: restart pybal, attempt to fix text-ipv6 service
* 01:05 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic1" lvs at all sites - [[phab:T324336|T324336]]
* 01:00 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic2" lvs at all sites - [[phab:T324336|T324336]]
* 00:47 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "secondary" lvs at all sites - [[phab:T324336|T324336]] (5 hosts, ulsfo completed previously)
* 00:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1012.eqiad.wmnet with OS bullseye
* 00:29 bblack: lvs4010: restart pybal to test etcd key changes - [[phab:T324336|T324336]]
* 00:16 bblack: disabling puppet on all cp and lvs hosts for conftool key changes.  Please coordinate if any lvs/pybal/cpNNNN depooling/work is needed during this transition!
* 00:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=cdn
* 00:12 bblack@cumin1001: conftool action : set/weight=1; selector: service=cdn
* 00:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage
* 00:04 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage


== 2020-05-18 ==
== 2022-12-07 ==
* 23:50 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1012.eqiad.wmnet with OS bullseye
* 23:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 23:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 23:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42545 and previous config saved to /var/cache/conftool/dbconfig/20221207-233130-ladsgroup.json
* 23:12 ryankemper: Restarted `wdqs-updater` across all wdqs nodes and restarted `wdqs-categories` across all nodes except 1010 (test wdqs server) and 1009 (automated deployment server)
* 23:24 mutante: mx1001 about to run out of disk again -  apt-get clean, gzip /var/log/exim4/mainlog.1  find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not [[phab:T305567|T305567]]
* 22:55 Krinkle: Clear module_deps on dewiki (group2, old mw version, s5) to monitor regeneration
* 23:23 mutante: mx1001 - apt-get clean, gzip /var/log/exim4/mainlog.1  find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not
* 22:48 Krinkle: Clear module_deps on group0 (mostly s3) to monitor regeneration
* 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42544 and previous config saved to /var/cache/conftool/dbconfig/20221207-231623-ladsgroup.json
* 22:35 Krinkle: Clear module_deps on commonswiki (group1, s4) to monitor regeneration
* 23:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:865749{{!}}Make parsoid accept all content models. (T324711)]] (duration: 13m 57s)
* 22:33 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@4886dc3]: 0.3.32 (duration: 17m 12s)
* 23:02 samtar@deploy1002: samtar and samtar: Backport for [[gerrit:865749{{!}}Make parsoid accept all content models. (T324711)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42543 and previous config saved to /var/cache/conftool/dbconfig/20221207-230116-ladsgroup.json
* 22:18 Krinkle: Clear module_deps on s2 wikis to monitor regeneration
* 23:00 samtar@deploy1002: Started scap: Backport for [[gerrit:865749{{!}}Make parsoid accept all content models. (T324711)]]
* 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 22:15 ryankemper@deploy1001: Started deploy [wdqs/wdqs@4886dc3]: 0.3.32
* 22:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 22:02 Krinkle: Clear module_deps on hewiki (group1, s7) to monitor regeneration, ref [[phab:T247028|T247028]]
* 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 21:23 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/resourceloader/dependencystore/: {{Gerrit|I015fa5885}}, {{Gerrit|I972a93806006}} (duration: 01m 07s)
* 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 21:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
* 20:27 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@12efc14]: Update mobileapps to {{Gerrit|c960b349}} (duration: 03m 31s)
* 22:48 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@12efc14]: Update mobileapps to {{Gerrit|c960b349}}
* 22:48 TheresNoTime: Going to backport [[gerrit:865749]] to wmf/1.40.0-wmf.13 for [[phab:T324711|T324711]]
* 19:07 herron: performing rolling maintenance on kafka-main to pick up java security updates
* 22:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ic005093778d}} (duration: 01m 08s)
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42542 and previous config saved to /var/cache/conftool/dbconfig/20221207-224610-ladsgroup.json
* 18:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: {{Gerrit|Ic005093778d}} (duration: 01m 06s)
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42541 and previous config saved to /var/cache/conftool/dbconfig/20221207-224502-ladsgroup.json
* 18:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 18:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 18:38 volans: upgraded spicerack to 0.0.37-1 on cumin[12]001
* 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42540 and previous config saved to /var/cache/conftool/dbconfig/20221207-224440-ladsgroup.json
* 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 22:41 ryankemper: [[phab:T301167|T301167]] Downtimed `wdqs20[09-12]` for 7 days
* 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix English Wikipedia wordmark dimensions ([[phab:T252143|T252143]]) (duration: 01m 06s)
* 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:14 XioNoX: update domain object for 56.15.185.in-addr.arpa - [[phab:T247972|T247972]]
* 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2009.*
* 17:06 bblack: dns1001 - removing downtimes, back in service - [[phab:T241770|T241770]]
* 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2010.*
* 16:45 bstorm_: updated views on labsdb1011 for the wb_terms changes [[phab:T251598|T251598]]
* 22:35 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 16:32 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:32 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 22:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 16:17 bblack: dns1001 - reimaging for new NIC - [[phab:T241770|T241770]]
* 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:10 volans: uploaded spicerack_0.0.37-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 22:29 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:52 hnowlan: rolling codfw cassandra for java security updates
* 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42539 and previous config saved to /var/cache/conftool/dbconfig/20221207-222934-ladsgroup.json
* 15:51 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 22:28 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 22:26 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:11 Krinkle: krinkle@mc1021 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:25 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 14:57 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 22:25 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 14:56 hnowlan: roll-restart of sessionstore cassandra hosts for java security update
* 22:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 14:55 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42538 and previous config saved to /var/cache/conftool/dbconfig/20221207-221427-ladsgroup.json
* 14:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42537 and previous config saved to /var/cache/conftool/dbconfig/20221207-220110-ladsgroup.json
* 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42536 and previous config saved to /var/cache/conftool/dbconfig/20221207-215921-ladsgroup.json
* 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42535 and previous config saved to /var/cache/conftool/dbconfig/20221207-215712-ladsgroup.json
* 14:35 hnowlan@deploy1001: Finished deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this (duration: 01m 22s)
* 21:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 14:34 hnowlan@deploy1001: Started deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this
* 21:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 14:33 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of esams [[phab:T133821|T133821]]
* 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42534 and previous config saved to /var/cache/conftool/dbconfig/20221207-215651-ladsgroup.json
* 14:29 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqiad [[phab:T133821|T133821]]
* 21:56 TheresNoTime: UTC late backport window done
* 14:23 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqsin, ulsfo [[phab:T133821|T133821]]
* 21:51 samtar@deploy1002: backport aborted:  (duration: 00m 15s)
* 14:19 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of codfw [[phab:T133821|T133821]]
* 21:49 samtar@deploy1002: Sync cancelled.
* 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2073 while replacing it [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11216 and previous config saved to /var/cache/conftool/dbconfig/20200518-141505-kormat.json
* 21:47 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865773"
* 14:12 bblack: dns1001 - shutting down for [[phab:T241770|T241770]]
* 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42533 and previous config saved to /var/cache/conftool/dbconfig/20221207-214603-ladsgroup.json
* 14:09 volans: uploaded spicerack_0.0.36-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5003.eqsin.wmnet
* 14:07 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching back to authdns1001 (oops, that's not the server we're taking offline today!)
* 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:06 vgutierrez: upload trafficserver 8.0.7-1wm9 to apt.wm.o (buster)
* 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 14:02 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 21:43 samtar@deploy1002: samtar and stang: Backport for [[gerrit:865766{{!}}specieswiki: Install GeoData extension (T324348)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 21:43 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 13:57 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching from authdns1001 to dns1002 for [[phab:T241770|T241770]]
* 21:41 samtar@deploy1002: Started scap: Backport for [[gerrit:865766{{!}}specieswiki: Install GeoData extension (T324348)]]
* 13:29 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42532 and previous config saved to /var/cache/conftool/dbconfig/20221207-214145-ladsgroup.json
* 13:00 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: VectorTemplate: SkinTemplateToolboxEnd hook isn't deprecated - [[phab:T252906|T252906]] (duration: 01m 07s)
* 21:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 11:52 marostegui: Install 10.1.43-2 on db1122 and db1109 - [[phab:T251981|T251981]]
* 21:40 samtar@deploy1002: Finished scap: Backport for [[gerrit:865737{{!}}Remove Research Incentive survey from frwiki (T321930)]] (duration: 09m 04s)
* 11:27 Lucas_WMDE: EU SWAT done
* 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5003.eqsin.wmnet
* 11:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Wikibase/: SWAT: [[gerrit:596616{{!}}Fix core's TitleFactory not being used correctly (T252803)]] (duration: 01m 12s)
* 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
* 11:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:597010{{!}}Update GrowthExperiments mentor list page for viwiki]] (duration: 01m 06s)
* 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:596916{{!}}Make the threshold for Chinese WP to prevent publishing 5% more strict (T252786)]] (duration: 01m 06s)
* 21:34 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865742"
* 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:597033{{!}} Bumping portals to master (597033)]] (duration: 01m 06s)
* 21:32 samtar@deploy1002: samtar and dani: Backport for [[gerrit:865737{{!}}Remove Research Incentive survey from frwiki (T321930)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:597033{{!}} Bumping portals to master (597033)]] (duration: 01m 32s)
* 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS buster
* 10:37 elukey: copy prometheus-druid-exporter 0.8-1 from stretch to buster wikimedia
* 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 10:20 _joe_: upgrading purged in the remaining datacenters
* 21:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 10:07 elukey: upload druid 0.12.3-1.1 to stretch{{!}}buster-wikimedia
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42530 and previous config saved to /var/cache/conftool/dbconfig/20221207-213057-ladsgroup.json
* 10:02 vgutierrez: upload trafficserver 8.0.7-1wm8 to apt.wm.o (buster)
* 21:30 samtar@deploy1002: Started scap: Backport for [[gerrit:865737{{!}}Remove Research Incentive survey from frwiki (T321930)]]
* 09:53 _joe_: upgrading purged in codfw, ulsfo
* 21:28 samtar@deploy1002: Finished scap: Backport for [[gerrit:865070{{!}}hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071{{!}}Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]] (duration: 20m 35s)
* 09:46 mutante: contint2001 - apt-get remove --purge openjdk-11-* - [[phab:T224591|T224591]]
* 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42529 and previous config saved to /var/cache/conftool/dbconfig/20221207-212638-ladsgroup.json
* 09:43 _joe_: upload purged 0.13 to buster-wikimedia
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42528 and previous config saved to /var/cache/conftool/dbconfig/20221207-211551-ladsgroup.json
* 08:44 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42527 and previous config saved to /var/cache/conftool/dbconfig/20221207-211338-ladsgroup.json
* 08:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42526 and previous config saved to /var/cache/conftool/dbconfig/20221207-211317-ladsgroup.json
* 08:13 godog: set weight to 0 for all but objects in ms-be10[678] - [[phab:T252008|T252008]]
* 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42525 and previous config saved to /var/cache/conftool/dbconfig/20221207-211132-ladsgroup.json
* 07:57 mutante: replacing apache module with httpd module on deployment servers
* 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
* 07:47 moritzm: installing apt security updates on jessie systems
* 21:10 samtar@deploy1002: samtar and daniel: Backport for [[gerrit:865070{{!}}hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071{{!}}Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 07:36 marostegui: Remove and add pc2007 from tendril as the Act is frozen after reimage - [[phab:T250666|T250666]]
* 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42524 and previous config saved to /var/cache/conftool/dbconfig/20221207-210923-ladsgroup.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088 after upgrade', diff saved to https://phabricator.wikimedia.org/P11214 and previous config saved to /var/cache/conftool/dbconfig/20200518-072234-marostegui.json
* 21:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 07:20 marostegui: Upload MariaDB 10.4.13 to the buster repo - [[phab:T250666|T250666]]
* 21:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42523 and previous config saved to /var/cache/conftool/dbconfig/20221207-210902-ladsgroup.json
* 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 21:08 samtar@deploy1002: Started scap: Backport for [[gerrit:865070{{!}}hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529)]], [[gerrit:865071{{!}}Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)]]
* 06:41 marostegui: Stop MySQL on db2088
* 21:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 for upgrade', diff saved to https://phabricator.wikimedia.org/P11213 and previous config saved to /var/cache/conftool/dbconfig/20200518-062452-marostegui.json
* 21:02 mutante: contint1002 a2dismod mpm_event  - https://phabricator.wikimedia.org/T208108 Bug: [[phab:T313832|T313832]]
* 05:55 _joe_: installing purged 0.12 on cp2027
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42522 and previous config saved to /var/cache/conftool/dbconfig/20221207-205811-ladsgroup.json
* 05:54 _joe_: uploaded purged 0.12 to apt.w.o
* 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42521 and previous config saved to /var/cache/conftool/dbconfig/20221207-205356-ladsgroup.json
* 05:00 marostegui: Stop MySQL on labsdb1011 to copy its content to backup1001 [[phab:T249188|T249188]]
* 20:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS buster
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42520 and previous config saved to /var/cache/conftool/dbconfig/20221207-204304-ladsgroup.json
* 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42519 and previous config saved to /var/cache/conftool/dbconfig/20221207-203849-ladsgroup.json
* 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42517 and previous config saved to /var/cache/conftool/dbconfig/20221207-202758-ladsgroup.json
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42516 and previous config saved to /var/cache/conftool/dbconfig/20221207-202545-ladsgroup.json
* 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42515 and previous config saved to /var/cache/conftool/dbconfig/20221207-202524-ladsgroup.json
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42514 and previous config saved to /var/cache/conftool/dbconfig/20221207-202343-ladsgroup.json
* 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42513 and previous config saved to /var/cache/conftool/dbconfig/20221207-202134-ladsgroup.json
* 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42512 and previous config saved to /var/cache/conftool/dbconfig/20221207-202113-ladsgroup.json
* 20:20 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]] (duration: 07m 03s)
* 20:13 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42511 and previous config saved to /var/cache/conftool/dbconfig/20221207-201016-ladsgroup.json
* 20:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
* 20:09 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
* 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42510 and previous config saved to /var/cache/conftool/dbconfig/20221207-200606-ladsgroup.json
* 20:00 mutante: contint* - deploying firewall changes to add contint1002 - [[phab:T313832|T313832]]
* 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42509 and previous config saved to /var/cache/conftool/dbconfig/20221207-195510-ladsgroup.json
* 19:53 mutante: registry* (docker registry HA) - adding contint1002 to allowed hosts gerrit:865680 [[phab:T313832|T313832]]
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42508 and previous config saved to /var/cache/conftool/dbconfig/20221207-195100-ladsgroup.json
* 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42507 and previous config saved to /var/cache/conftool/dbconfig/20221207-194003-ladsgroup.json
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42506 and previous config saved to /var/cache/conftool/dbconfig/20221207-193751-ladsgroup.json
* 19:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 19:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42505 and previous config saved to /var/cache/conftool/dbconfig/20221207-193730-ladsgroup.json
* 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42504 and previous config saved to /var/cache/conftool/dbconfig/20221207-193553-ladsgroup.json
* 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42503 and previous config saved to /var/cache/conftool/dbconfig/20221207-193445-ladsgroup.json
* 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42502 and previous config saved to /var/cache/conftool/dbconfig/20221207-193350-ladsgroup.json
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42501 and previous config saved to /var/cache/conftool/dbconfig/20221207-192223-ladsgroup.json
* 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42500 and previous config saved to /var/cache/conftool/dbconfig/20221207-191843-ladsgroup.json
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42499 and previous config saved to /var/cache/conftool/dbconfig/20221207-191328-ladsgroup.json
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42498 and previous config saved to /var/cache/conftool/dbconfig/20221207-190717-ladsgroup.json
* 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42497 and previous config saved to /var/cache/conftool/dbconfig/20221207-190337-ladsgroup.json
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42496 and previous config saved to /var/cache/conftool/dbconfig/20221207-185821-ladsgroup.json
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42495 and previous config saved to /var/cache/conftool/dbconfig/20221207-185210-ladsgroup.json
* 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42494 and previous config saved to /var/cache/conftool/dbconfig/20221207-184958-ladsgroup.json
* 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:49 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1026.eqiad.wmnet with OS bullseye
* 18:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42493 and previous config saved to /var/cache/conftool/dbconfig/20221207-184851-ladsgroup.json
* 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42492 and previous config saved to /var/cache/conftool/dbconfig/20221207-184830-ladsgroup.json
* 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42491 and previous config saved to /var/cache/conftool/dbconfig/20221207-184722-ladsgroup.json
* 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42490 and previous config saved to /var/cache/conftool/dbconfig/20221207-184700-ladsgroup.json
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1023.eqiad.wmnet with OS bullseye
* 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42489 and previous config saved to /var/cache/conftool/dbconfig/20221207-184315-ladsgroup.json
* 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42488 and previous config saved to /var/cache/conftool/dbconfig/20221207-183344-ladsgroup.json
* 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42487 and previous config saved to /var/cache/conftool/dbconfig/20221207-183154-ladsgroup.json
* 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42486 and previous config saved to /var/cache/conftool/dbconfig/20221207-182808-ladsgroup.json
* 18:27 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
* 18:23 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
* 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42485 and previous config saved to /var/cache/conftool/dbconfig/20221207-181838-ladsgroup.json
* 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42484 and previous config saved to /var/cache/conftool/dbconfig/20221207-181647-ladsgroup.json
* 18:06 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
* 18:04 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1026']
* 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42483 and previous config saved to /var/cache/conftool/dbconfig/20221207-180331-ladsgroup.json
* 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2002.codfw.wmnet with OS bullseye
* 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42482 and previous config saved to /var/cache/conftool/dbconfig/20221207-180140-ladsgroup.json
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42481 and previous config saved to /var/cache/conftool/dbconfig/20221207-180132-ladsgroup.json
* 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42480 and previous config saved to /var/cache/conftool/dbconfig/20221207-180119-ladsgroup.json
* 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42479 and previous config saved to /var/cache/conftool/dbconfig/20221207-180110-ladsgroup.json
* 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42478 and previous config saved to /var/cache/conftool/dbconfig/20221207-180058-ladsgroup.json
* 18:00 sukhe: restart pybal on lvs5003 to pick up bgp-med change
* 17:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
* 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:53 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1026']
* 17:50 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc] (duration: 01m 15s)
* 17:49 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc]
* 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 17:48 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc] (duration: 00m 07s)
* 17:48 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc]
* 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 [[phab:T324692|T324692]]', diff saved to https://phabricator.wikimedia.org/P42477 and previous config saved to /var/cache/conftool/dbconfig/20221207-174811-ladsgroup.json
* 17:46 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
* 17:46 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc] (duration: 79m 12s)
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42476 and previous config saved to /var/cache/conftool/dbconfig/20221207-174604-ladsgroup.json
* 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42475 and previous config saved to /var/cache/conftool/dbconfig/20221207-174551-ladsgroup.json
* 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary [[phab:T324692|T324692]]', diff saved to https://phabricator.wikimedia.org/P42474 and previous config saved to /var/cache/conftool/dbconfig/20221207-174540-ladsgroup.json
* 17:45 Amir1: Starting s1 codfw failover from db2112 to db2103 - [[phab:T324692|T324692]]
* 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 17:42 sukhe: restart pybal on lvs5005 to pick up bgp-med
* 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 17:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42473 and previous config saved to /var/cache/conftool/dbconfig/20221207-173350-ladsgroup.json
* 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42472 and previous config saved to /var/cache/conftool/dbconfig/20221207-173329-ladsgroup.json
* 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42471 and previous config saved to /var/cache/conftool/dbconfig/20221207-173057-ladsgroup.json
* 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42470 and previous config saved to /var/cache/conftool/dbconfig/20221207-173045-ladsgroup.json
* 17:27 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
* 17:26 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
* 17:26 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1026.eqiad.wmnet with OS bullseye
* 17:25 sukhe: running homer for Gerrit: 865712
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42469 and previous config saved to /var/cache/conftool/dbconfig/20221207-171822-ladsgroup.json
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42468 and previous config saved to /var/cache/conftool/dbconfig/20221207-171803-ladsgroup.json
* 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5002.eqsin.wmnet
* 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42467 and previous config saved to /var/cache/conftool/dbconfig/20221207-171551-ladsgroup.json
* 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42466 and previous config saved to /var/cache/conftool/dbconfig/20221207-171538-ladsgroup.json
* 17:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
* 17:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 [[phab:T324692|T324692]]', diff saved to https://phabricator.wikimedia.org/P42465 and previous config saved to /var/cache/conftool/dbconfig/20221207-171416-ladsgroup.json
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42464 and previous config saved to /var/cache/conftool/dbconfig/20221207-171342-ladsgroup.json
* 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42463 and previous config saved to /var/cache/conftool/dbconfig/20221207-171326-ladsgroup.json
* 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42462 and previous config saved to /var/cache/conftool/dbconfig/20221207-171321-ladsgroup.json
* 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42461 and previous config saved to /var/cache/conftool/dbconfig/20221207-171305-ladsgroup.json
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
* 17:12 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 [[phab:T324692|T324692]]
* 17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 [[phab:T324692|T324692]]
* 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
* 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
* 17:08 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5002.eqsin.wmnet
* 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42460 and previous config saved to /var/cache/conftool/dbconfig/20221207-170316-ladsgroup.json
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42459 and previous config saved to /var/cache/conftool/dbconfig/20221207-170256-ladsgroup.json
* 17:01 jiji@deploy1002: Finished scap: Backport for [[gerrit:865123{{!}}ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581)]] (duration: 14m 46s)
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
* 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42458 and previous config saved to /var/cache/conftool/dbconfig/20221207-165815-ladsgroup.json
* 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42457 and previous config saved to /var/cache/conftool/dbconfig/20221207-165758-ladsgroup.json
* 16:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 16:56 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
* 16:55 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
* 16:55 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
* 16:55 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
* 16:48 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865123{{!}}ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42456 and previous config saved to /var/cache/conftool/dbconfig/20221207-164809-ladsgroup.json
* 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42455 and previous config saved to /var/cache/conftool/dbconfig/20221207-164748-ladsgroup.json
* 16:46 jiji@deploy1002: Started scap: Backport for [[gerrit:865123{{!}}ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581)]]
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42454 and previous config saved to /var/cache/conftool/dbconfig/20221207-164308-ladsgroup.json
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42453 and previous config saved to /var/cache/conftool/dbconfig/20221207-164258-ladsgroup.json
* 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42452 and previous config saved to /var/cache/conftool/dbconfig/20221207-164252-ladsgroup.json
* 16:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
* 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 16:42 sukhe: restart pybal on lvs5002
* 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
* 16:38 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.240/28 next-hop 10.132.0.6: [[phab:T322048|T322048]]
* 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42451 and previous config saved to /var/cache/conftool/dbconfig/20221207-163242-ladsgroup.json
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42450 and previous config saved to /var/cache/conftool/dbconfig/20221207-163031-ladsgroup.json
* 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 16:29 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42449 and previous config saved to /var/cache/conftool/dbconfig/20221207-162802-ladsgroup.json
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42448 and previous config saved to /var/cache/conftool/dbconfig/20221207-162745-ladsgroup.json
* 16:27 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc]
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42447 and previous config saved to /var/cache/conftool/dbconfig/20221207-162553-ladsgroup.json
* 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42446 and previous config saved to /var/cache/conftool/dbconfig/20221207-162533-ladsgroup.json
* 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 16:25 aqu: Deploying analytics/refinery (HDFS usage scripts)
* 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 16:24 sukhe: run homer in cr*-eqsin for Gerrit: 865615
* 16:09 denisse: sync rancid in netmon2001 and netmon2002
* 16:08 denisse: sync librenms RRD in netmon2002
* 16:08 jiji@deploy1002: Finished scap: Backport for [[gerrit:865122{{!}}ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581)]] (duration: 10m 59s)
* 16:06 sukhe: run homer in cr*-eqsin for Gerrit: 865660
* 16:02 denisse: Sync LibreNMS RRD in netmon2001
* 15:59 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865122{{!}}ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:57 jiji@deploy1002: Started scap: Backport for [[gerrit:865122{{!}}ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581)]]
* 15:46 jiji@deploy1002: Finished scap: Backport for [[gerrit:865121{{!}}ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581)]] (duration: 08m 29s)
* 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS buster
* 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 15:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 15:39 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865121{{!}}ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:37 jiji@deploy1002: Started scap: Backport for [[gerrit:865121{{!}}ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581)]]
* 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS buster
* 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 15:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 15:24 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
* 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
* 15:11 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 15:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
* 15:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
* 14:56 krinkle@deploy1002: Finished deploy [performance/navtiming@6caa033]: (no justification provided) (duration: 00m 07s)
* 14:56 krinkle@deploy1002: Started deploy [performance/navtiming@6caa033]: (no justification provided)
* 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS buster
* 14:38 XioNoX: draining Arelion eqiad-codfw circuit for optic replacement
* 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS buster
* 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5002.wikimedia.org
* 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 14:27 moritzm: restarting ntpd
* 14:26 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 14:24 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 14:20 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5002.wikimedia.org
* 14:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
* 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
* 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
* 14:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
* 13:42 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]] (duration: 07m 45s)
* 13:34 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 13:25 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
* 13:22 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42443 and previous config saved to /var/cache/conftool/dbconfig/20221207-131858-marostegui.json
* 13:09 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
* 13:09 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
* 12:37 moritzm: upgrading mwmaint servers to PHP 7.4.33
* 12:33 moritzm: upgrading deployment servers to PHP 7.4.33
* 12:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:32 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:30 moritzm: upgrading cloudweb to PHP 7.4.33
* 12:28 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:28 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
* 12:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
* 12:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
* 12:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
* 12:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 12:15 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:15 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:12 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:12 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 12:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 12:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 12:10 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
* 12:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
* 12:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
* 12:09 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 12:09 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 11:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
* 11:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
* 11:59 moritzm: imported rsyslog 8.2102.0-2+deb11u1~buster1 to component/rsyslog-openssl [[phab:T324623|T324623]]
* 11:57 moritzm: imported librelp 1.10.0-1~buster1 to component/rsyslog-openssl [[phab:T324623|T324623]]
* 11:55 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
* 11:55 sukhe: running authdns-update for Gerrit: 865605
* 11:55 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
* 11:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 11:54 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 11:51 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks (duration: 00m 14s)
* 11:50 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks
* 11:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 11:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 11:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:11 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin2001.codfw.wmnet
* 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin2001.codfw.wmnet on all recursors
* 11:06 volans@cumin2002: START - Cookbook sre.dns.wipe-cache cloudcumin2001.codfw.wmnet on all recursors
* 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:06 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
* 11:05 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
* 11:03 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 11:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 11:01 volans@cumin2002: START - Cookbook sre.dns.netbox
* 11:01 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host cloudcumin2001.codfw.wmnet
* 10:58 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin1001.eqiad.wmnet
* 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin1001.eqiad.wmnet on all recursors
* 10:53 volans@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcumin1001.eqiad.wmnet on all recursors
* 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:53 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
* 10:52 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
* 10:50 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudcumin1001.eqiad.wmnet
* 10:35 jiji@deploy1002: Finished scap: Backport for [[gerrit:865119{{!}}ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581)]] (duration: 10m 29s)
* 10:27 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865119{{!}}ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:26 claime: rebooted contin1001.eqiad.wmnet
* 10:25 jiji@deploy1002: Started scap: Backport for [[gerrit:865119{{!}}ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581)]]
* 10:17 jiji@deploy1002: Finished scap: Backport for [[gerrit:865118{{!}}ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581)]] (duration: 10m 48s)
* 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
* 10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
* 10:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 714
* 10:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
* 10:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16276
* 10:09 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:09 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865118{{!}}ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 10:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:07 jiji@deploy1002: Started scap: Backport for [[gerrit:865118{{!}}ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581)]]
* 10:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 10:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35320
* 10:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35320
* 10:00 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 8932
* 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
* 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
* 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13150
* 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138064
* 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138064
* 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
* 09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
* 09:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
* 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31800
* 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
* 09:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 31800
* 09:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
* 09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45430
* 09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45430
* 09:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7568
* 09:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7568
* 09:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:36 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:35 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:35 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:34 jiji@deploy1002: Finished scap: Backport for [[gerrit:865117{{!}}ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581)]] (duration: 09m 08s)
* 09:27 jiji@deploy1002: jiji and jiji: Backport for [[gerrit:865117{{!}}ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 09:25 jiji@deploy1002: Started scap: Backport for [[gerrit:865117{{!}}ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581)]]
* 09:23 jiji@deploy1002: backport aborted:  (duration: 00m 18s)
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 395570
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 395570
* 09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 397715
* 09:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 397715
* 09:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
* 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1108.eqiad.wmnet
* 09:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
* 09:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42439 and previous config saved to /var/cache/conftool/dbconfig/20221207-071831-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42438 and previous config saved to /var/cache/conftool/dbconfig/20221207-070326-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42437 and previous config saved to /var/cache/conftool/dbconfig/20221207-064821-root.json
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42436 and previous config saved to /var/cache/conftool/dbconfig/20221207-063316-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42435 and previous config saved to /var/cache/conftool/dbconfig/20221207-061810-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42434 and previous config saved to /var/cache/conftool/dbconfig/20221207-060305-root.json
* 05:58 marostegui: Drop phab1001 grants from m3 databases [[phab:T323418|T323418]]
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42433 and previous config saved to /var/cache/conftool/dbconfig/20221207-054759-root.json
* 03:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1011.eqiad.wmnet with OS bullseye
* 03:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
* 03:15 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
* 02:49 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1011.eqiad.wmnet with OS bullseye
* 02:39 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1011']
* 02:33 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1011']
* 00:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 00:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1023.eqiad.wmnet with OS bullseye


== 2020-05-16 ==
== 2022-12-06 ==
* 22:04 Krinkle: krinkle@mc1022 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 23:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 21:56 Krinkle: krinkle@mc1019 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 23:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
* 20:23 Krinkle: krinkle@mc1034,mc1035,mc1036 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 23:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 20:04 Krinkle: krinkle@mc1033 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:57 Krinkle: krinkle@mc1032 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:51 Krinkle: krinkle@mc1031 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 Krinkle: krinkle@mc1030 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 19:25 Krinkle: krinkle@mc1029 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:37 tgr_: UTC late backports done
* 19:10 Krinkle: krinkle@mc1028 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 22:36 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 18:58 Krinkle: krinkle@mc1027 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 22:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:54 Krinkle: krinkle@mc1026 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 22:26 tgr@deploy1002: Finished scap: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]] (duration: 18m 58s)
* 18:30 Krinkle: krinkle@mc1024 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 22:09 tgr@deploy1002: tgr and tgr: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:24 Krinkle: krinkle@mc1025 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 22:07 tgr@deploy1002: Started scap: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]]
* 17:56 Krinkle: krinkle@mc1023 Pruning old echo:seen: Redis keys that didn't use a ttl yet, ref [[phab:T252945|T252945]]
* 21:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:49 Krinkle: krinkle@mwmaint1002: Running cleanupRemovedModules.php to prune old module_deps rows [[phab:T113916|T113916]]
* 21:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:24 Krinkle: krinkle@mc1020 Prune old echo:seen: keys that have ttl:-1 from Redis main stash, ref [[phab:T252945|T252945]]
* 21:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1029.eqiad.wmnet with OS bullseye
* 15:16 Krinkle: krinkle@mc1020 Looking at why there are still over 2M echo:seen keys in redis main stash
* 20:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|I046868190b472}} (duration: 01m 13s)
* 20:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
* 00:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
* 00:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:25 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 00:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:25 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
* 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:24 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 00:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1028.eqiad.wmnet with OS bullseye
* 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1029']
* 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
* 00:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:13 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1029']
* 00:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:06 eileen: civicrm upgraded from {{Gerrit|c9761fee}} to {{Gerrit|3ae68ab4}}
* 00:10 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
* 00:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:01 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
* 00:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bullseye
* 00:06 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:57 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
* 00:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bullseye
* 00:05 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bullseye
* 00:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:45 ejegg: payments-wiki upgraded from {{Gerrit|a875f2b9}} to {{Gerrit|1914b6c7}}
* 19:40 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 19:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
* 19:38 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1027.eqiad.wmnet with OS bullseye
* 19:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
* 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
* 19:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]] (duration: 12m 43s)
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
* 19:32 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 19:22 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 19:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1028']
* 19:21 ladsgroup@deploy1002: ladsgroup and jdlrobson: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 19:19 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]]
* 19:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 19:16 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
* 19:14 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 19:12 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bullseye
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bullseye
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bullseye
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5006']
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5007']
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5005']
* 18:56 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
* 18:55 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 18:45 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5007']
* 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5006']
* 18:44 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5005']
* 18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 18:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns5003']
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5006']
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5005']
* 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1027']
* 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 18:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 18:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns5003']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5006']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5005']
* 18:28 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['logstash1028']
* 18:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 18:27 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 18:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 18:26 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:18 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:07 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
* 18:07 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
* 18:07 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 09s)
* 18:06 robh@cumin2002: START - Cookbook sre.dns.netbox
* 18:05 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
* 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:02 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 18:02 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 17:56 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1027.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5003
* 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5003
* 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5007
* 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5007
* 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5006
* 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 17:24 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5006
* 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5005
* 17:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5005
* 17:22 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 17:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
* 17:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
* 17:17 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
* 17:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 17:12 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
* 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 17:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 16:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 16:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 16:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 16:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 16:46 kostajh: UTC afternoon backports done
* 16:44 kharlan@deploy1002: Finished scap: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]] (duration: 10m 54s)
* 16:35 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:33 kharlan@deploy1002: Started scap: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]]
* 16:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
* 16:30 kharlan@deploy1002: Finished scap: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]] (duration: 10m 14s)
* 16:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 16:21 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5005
* 16:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5005
* 16:21 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
* 16:19 kharlan@deploy1002: Started scap: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]]
* 16:18 kharlan@deploy1002: backport aborted:  (duration: 02m 53s)
* 16:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
* 16:15 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:13 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] (duration: 29m 43s)
* 16:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:10 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:02 kharlan@deploy1002: kharlan and urbanecm and kharlan: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.
* 15:45 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
* 15:44 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
* 15:43 kharlan@deploy1002: Started scap: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]]
* 15:41 reedy@deploy1002: Synchronized php-1.40.0-wmf.13/extensions/SecurePoll/includes/Pages/ListPager.php: [[phab:T324556|T324556]] (duration: 07m 01s)
* 15:33 reedy@deploy1002: Synchronized php-1.40.0-wmf.12/extensions/SecurePoll/includes/Pages/ListPager.php: [[phab:T324556|T324556]] (duration: 07m 13s)
* 15:20 kharlan@deploy1002: Finished scap: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]] (duration: 10m 48s)
* 15:13 kharlan@deploy1002: kharlan and urbanecm: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:10 kharlan@deploy1002: Started scap: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]]
* 14:52 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]] (duration: 10m 07s)
* 14:44 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:42 kharlan@deploy1002: Started scap: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]]
* 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 14:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:31 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 14:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 14:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:25 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:21 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]] (duration: 09m 04s)
* 14:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 14:13 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:12 kharlan@deploy1002: Started scap: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]]
* 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 13:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 13:00 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]] (duration: 07m 57s)
* 12:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:54 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:52 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 12:52 kharlan@deploy1002: Started scap: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]]
* 12:49 moritzm: installing glibc security updates on buster
* 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:29 jnuche@deploy1002: Pruned MediaWiki: 1.40.0-wmf.10 (duration: 02m 09s)
* 12:27 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:27 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx (exit_code=1) rolling restart_daemons on A:wcqs-public
* 12:27 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]] (duration: 05m 52s)
* 12:21 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 12:14 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 12:10 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 10:59 kostajh: UTC morning deploys done
* 10:56 moritzm: installing freetype security updates
* 10:48 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] (duration: 31m 25s)
* 10:36 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:16 kharlan@deploy1002: Started scap: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]]
* 09:37 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]] (duration: 28m 05s)
* 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:09 kharlan@deploy1002: Started scap: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]]
* 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42426 and previous config saved to /var/cache/conftool/dbconfig/20221206-064402-ladsgroup.json
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42425 and previous config saved to /var/cache/conftool/dbconfig/20221206-062856-ladsgroup.json
* 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42424 and previous config saved to /var/cache/conftool/dbconfig/20221206-061349-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42423 and previous config saved to /var/cache/conftool/dbconfig/20221206-055843-ladsgroup.json
* 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42422 and previous config saved to /var/cache/conftool/dbconfig/20221206-054030-ladsgroup.json
* 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42421 and previous config saved to /var/cache/conftool/dbconfig/20221206-053911-ladsgroup.json
* 05:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42420 and previous config saved to /var/cache/conftool/dbconfig/20221206-053850-ladsgroup.json
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42419 and previous config saved to /var/cache/conftool/dbconfig/20221206-052523-ladsgroup.json
* 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42418 and previous config saved to /var/cache/conftool/dbconfig/20221206-052343-ladsgroup.json
* 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42417 and previous config saved to /var/cache/conftool/dbconfig/20221206-051016-ladsgroup.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42416 and previous config saved to /var/cache/conftool/dbconfig/20221206-050837-ladsgroup.json
* 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42415 and previous config saved to /var/cache/conftool/dbconfig/20221206-045510-ladsgroup.json
* 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42414 and previous config saved to /var/cache/conftool/dbconfig/20221206-045330-ladsgroup.json
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42413 and previous config saved to /var/cache/conftool/dbconfig/20221206-043348-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42412 and previous config saved to /var/cache/conftool/dbconfig/20221206-043326-ladsgroup.json
* 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42411 and previous config saved to /var/cache/conftool/dbconfig/20221206-042850-ladsgroup.json
* 04:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 04:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42410 and previous config saved to /var/cache/conftool/dbconfig/20221206-042828-ladsgroup.json
* 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42409 and previous config saved to /var/cache/conftool/dbconfig/20221206-041820-ladsgroup.json
* 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42408 and previous config saved to /var/cache/conftool/dbconfig/20221206-041322-ladsgroup.json
* 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42407 and previous config saved to /var/cache/conftool/dbconfig/20221206-040313-ladsgroup.json
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42406 and previous config saved to /var/cache/conftool/dbconfig/20221206-035815-ladsgroup.json
* 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42405 and previous config saved to /var/cache/conftool/dbconfig/20221206-034806-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42404 and previous config saved to /var/cache/conftool/dbconfig/20221206-034309-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42403 and previous config saved to /var/cache/conftool/dbconfig/20221206-032818-ladsgroup.json
* 03:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42402 and previous config saved to /var/cache/conftool/dbconfig/20221206-032756-ladsgroup.json
* 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42401 and previous config saved to /var/cache/conftool/dbconfig/20221206-031250-ladsgroup.json
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42400 and previous config saved to /var/cache/conftool/dbconfig/20221206-025831-ladsgroup.json
* 02:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42399 and previous config saved to /var/cache/conftool/dbconfig/20221206-025821-ladsgroup.json
* 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42398 and previous config saved to /var/cache/conftool/dbconfig/20221206-025743-ladsgroup.json
* 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42397 and previous config saved to /var/cache/conftool/dbconfig/20221206-024314-ladsgroup.json
* 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42396 and previous config saved to /var/cache/conftool/dbconfig/20221206-024236-ladsgroup.json
* 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42395 and previous config saved to /var/cache/conftool/dbconfig/20221206-022817-ladsgroup.json
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42394 and previous config saved to /var/cache/conftool/dbconfig/20221206-022808-ladsgroup.json
* 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42393 and previous config saved to /var/cache/conftool/dbconfig/20221206-021638-ladsgroup.json
* 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42392 and previous config saved to /var/cache/conftool/dbconfig/20221206-021617-ladsgroup.json
* 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42391 and previous config saved to /var/cache/conftool/dbconfig/20221206-021310-ladsgroup.json
* 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42390 and previous config saved to /var/cache/conftool/dbconfig/20221206-021301-ladsgroup.json
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42389 and previous config saved to /var/cache/conftool/dbconfig/20221206-020110-ladsgroup.json
* 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42388 and previous config saved to /var/cache/conftool/dbconfig/20221206-015757-ladsgroup.json
* 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42387 and previous config saved to /var/cache/conftool/dbconfig/20221206-014604-ladsgroup.json
* 01:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42386 and previous config saved to /var/cache/conftool/dbconfig/20221206-014251-ladsgroup.json
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42385 and previous config saved to /var/cache/conftool/dbconfig/20221206-014046-ladsgroup.json
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42384 and previous config saved to /var/cache/conftool/dbconfig/20221206-014038-ladsgroup.json
* 01:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 01:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42383 and previous config saved to /var/cache/conftool/dbconfig/20221206-014017-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42382 and previous config saved to /var/cache/conftool/dbconfig/20221206-013057-ladsgroup.json
* 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42381 and previous config saved to /var/cache/conftool/dbconfig/20221206-012812-ladsgroup.json
* 01:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 01:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42380 and previous config saved to /var/cache/conftool/dbconfig/20221206-012750-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42378 and previous config saved to /var/cache/conftool/dbconfig/20221206-012510-ladsgroup.json
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42377 and previous config saved to /var/cache/conftool/dbconfig/20221206-011244-ladsgroup.json
* 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42376 and previous config saved to /var/cache/conftool/dbconfig/20221206-011128-ladsgroup.json
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42375 and previous config saved to /var/cache/conftool/dbconfig/20221206-011033-ladsgroup.json
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42374 and previous config saved to /var/cache/conftool/dbconfig/20221206-011003-ladsgroup.json
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42373 and previous config saved to /var/cache/conftool/dbconfig/20221206-005737-ladsgroup.json
* 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42372 and previous config saved to /var/cache/conftool/dbconfig/20221206-005526-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42371 and previous config saved to /var/cache/conftool/dbconfig/20221206-005457-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42370 and previous config saved to /var/cache/conftool/dbconfig/20221206-005401-ladsgroup.json
* 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42369 and previous config saved to /var/cache/conftool/dbconfig/20221206-005339-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42368 and previous config saved to /var/cache/conftool/dbconfig/20221206-005244-ladsgroup.json
* 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42367 and previous config saved to /var/cache/conftool/dbconfig/20221206-005223-ladsgroup.json
* 00:51 cstone: payments-wiki upgraded from {{Gerrit|b613ddfb}} to {{Gerrit|0cd7e779}}
* 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42366 and previous config saved to /var/cache/conftool/dbconfig/20221206-004231-ladsgroup.json
* 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42365 and previous config saved to /var/cache/conftool/dbconfig/20221206-003833-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42364 and previous config saved to /var/cache/conftool/dbconfig/20221206-003716-ladsgroup.json
* 00:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42363 and previous config saved to /var/cache/conftool/dbconfig/20221206-002945-ladsgroup.json
* 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42362 and previous config saved to /var/cache/conftool/dbconfig/20221206-002326-ladsgroup.json
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42361 and previous config saved to /var/cache/conftool/dbconfig/20221206-002210-ladsgroup.json
* 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42360 and previous config saved to /var/cache/conftool/dbconfig/20221206-001438-ladsgroup.json
* 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42359 and previous config saved to /var/cache/conftool/dbconfig/20221206-000820-ladsgroup.json
* 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42358 and previous config saved to /var/cache/conftool/dbconfig/20221206-000703-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42357 and previous config saved to /var/cache/conftool/dbconfig/20221206-000654-ladsgroup.json
* 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42356 and previous config saved to /var/cache/conftool/dbconfig/20221206-000633-ladsgroup.json
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42355 and previous config saved to /var/cache/conftool/dbconfig/20221206-000444-ladsgroup.json
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42354 and previous config saved to /var/cache/conftool/dbconfig/20221206-000329-ladsgroup.json


== 2020-05-15 ==
== 2022-12-05 ==
* 23:50 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
* 23:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:57 tzatziki: removing 2 files for legal compliance
* 23:46 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
* 23:46 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:46 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
* 23:35 ryankemper: Pooled wdqs2007 following successful query tests (all data transfers are done now)
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
* 22:53 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1b1578a57ef5}} (duration: 01m 07s)
* 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates (duration: 02m 30s)
* 22:51 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Iaa240eb8cf9}} (duration: 01m 06s)
* 23:44 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates
* 21:41 ryankemper: depooled wdqs2007 while it catches up on lag
* 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42349 and previous config saved to /var/cache/conftool/dbconfig/20221205-234425-ladsgroup.json
* 21:40 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:41 tzatziki: removing 5 files for legal compliance
* 20:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42348 and previous config saved to /var/cache/conftool/dbconfig/20221205-233620-ladsgroup.json
* 20:33 ryankemper: pooled wdqs2003 and wdqs1007 following successful query tests
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42347 and previous config saved to /var/cache/conftool/dbconfig/20221205-233316-ladsgroup.json
* 19:46 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|If0fd1b51}} (duration: 01m 08s)
* 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42346 and previous config saved to /var/cache/conftool/dbconfig/20221205-232453-ladsgroup.json
* 18:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 18:34 ryankemper: depooled wdqs2003 while lag catches up
* 23:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 18:32 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42345 and previous config saved to /var/cache/conftool/dbconfig/20221205-232432-ladsgroup.json
* 17:55 vgutierrez: upload acme-chief 0.25 to apt.wm.o (buster) - [[phab:T252881|T252881]]
* 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42344 and previous config saved to /var/cache/conftool/dbconfig/20221205-232113-ladsgroup.json
* 17:27 XioNoX: renumber cr2-eqord:xe-0/1/1 to xe-0/1/3 - [[phab:T221259|T221259]]
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42343 and previous config saved to /var/cache/conftool/dbconfig/20221205-231948-ladsgroup.json
* 17:02 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
* 17:01 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
* 17:00 ryankemper: depooled wqds1007 in preparation for impending wdqs data xfer
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42342 and previous config saved to /var/cache/conftool/dbconfig/20221205-231926-ladsgroup.json
* 16:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42341 and previous config saved to /var/cache/conftool/dbconfig/20221205-231809-ladsgroup.json
* 16:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 16:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 16:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42340 and previous config saved to /var/cache/conftool/dbconfig/20221205-231608-ladsgroup.json
* 16:02 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42339 and previous config saved to /var/cache/conftool/dbconfig/20221205-231556-ladsgroup.json
* 15:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:56 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:52 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42338 and previous config saved to /var/cache/conftool/dbconfig/20221205-231535-ladsgroup.json
* 15:49 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42337 and previous config saved to /var/cache/conftool/dbconfig/20221205-230925-ladsgroup.json
* 15:45 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42336 and previous config saved to /var/cache/conftool/dbconfig/20221205-230419-ladsgroup.json
* 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42335 and previous config saved to /var/cache/conftool/dbconfig/20221205-230102-ladsgroup.json
* 15:40 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42334 and previous config saved to /var/cache/conftool/dbconfig/20221205-230028-ladsgroup.json
* 15:36 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42333 and previous config saved to /var/cache/conftool/dbconfig/20221205-225419-ladsgroup.json
* 15:32 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42332 and previous config saved to /var/cache/conftool/dbconfig/20221205-224913-ladsgroup.json
* 15:31 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42331 and previous config saved to /var/cache/conftool/dbconfig/20221205-224555-ladsgroup.json
* 15:27 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42330 and previous config saved to /var/cache/conftool/dbconfig/20221205-224522-ladsgroup.json
* 14:19 cdanis: reverting sysctl net.ipv4.udp_mem to original on netflow3001
* 22:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:18 cdanis: re-enable puppet on netflow*
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42329 and previous config saved to /var/cache/conftool/dbconfig/20221205-223912-ladsgroup.json
* 14:14 cdanis: disable puppet on netflow*
* 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42328 and previous config saved to /var/cache/conftool/dbconfig/20221205-223406-ladsgroup.json
* 14:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42326 and previous config saved to /var/cache/conftool/dbconfig/20221205-223140-ladsgroup.json
* 14:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
* 13:47 ema: cp2029, cp3050: varnish-fe-restart to clear 'child restarted' alerts
* 22:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
* 13:47 vgutierrez: downgrade ats to version 8.0.7-1wm7 on cp4032
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42325 and previous config saved to /var/cache/conftool/dbconfig/20221205-223119-ladsgroup.json
* 13:42 vgutierrez: upgrade ats to version 8.0.7-1wm8 on cp4032
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42324 and previous config saved to /var/cache/conftool/dbconfig/20221205-223049-ladsgroup.json
* 13:37 mutante: rsyncing gerrit git data from gerrit1001 to gerrit1002 ([[phab:T200739|T200739]])
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42323 and previous config saved to /var/cache/conftool/dbconfig/20221205-223015-ladsgroup.json
* 13:13 cdanis: increase samplicator recvbuf on netflow3001 & restart samplicator
* 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42322 and previous config saved to /var/cache/conftool/dbconfig/20221205-222903-ladsgroup.json
* 13:01 cdanis: increasing sysctl net.ipv4.udp_mem on netflow3001
* 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:57 vgutierrez: upload trafficserver 8.0.7-1wm7 to apt.wm.o (buster)
* 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:21 ema: cp2029: attempt forced discard of stuck VCL [[phab:T236754|T236754]]
* 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42321 and previous config saved to /var/cache/conftool/dbconfig/20221205-222852-ladsgroup.json
* 09:09 elukey: restart druid brokers on druid100[4-6] - locked up due to datasources dropped - [[phab:T226035|T226035]]
* 22:24 tzatziki: removing 1 file for legal compliance
* 08:51 ema: cp2029: try out varnish 5.1.3-1wm15 [[phab:T236754|T236754]]
* 22:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 07:36 XioNoX: bumps prefix limit for AS16735 in eqiad
* 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 05:35 jynus: stop replication on pc2009, pc2010 for benchmarking [[phab:T252761|T252761]]
* 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 04:53 volker-e@deploy1001: Finished deploy [design/style-guide@dc956a3]: Deploy design/style-guide:  (duration: 00m 10s)
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42320 and previous config saved to /var/cache/conftool/dbconfig/20221205-221612-ladsgroup.json
* 04:52 volker-e@deploy1001: Started deploy [design/style-guide@dc956a3]: Deploy design/style-guide:
* 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42319 and previous config saved to /var/cache/conftool/dbconfig/20221205-221346-ladsgroup.json
* 04:42 vgutierrez: repool cp5006
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42317 and previous config saved to /var/cache/conftool/dbconfig/20221205-220105-ladsgroup.json
* 04:28 vgutierrez: depool and reboot cp5006
* 22:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
* 21:59 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox and syncing netbox data - [[phab:T296022|T296022]]
* 21:58 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42316 and previous config saved to /var/cache/conftool/dbconfig/20221205-215839-ladsgroup.json
* 21:55 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 21:55 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox - [[phab:T280597|T280597]]
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42315 and previous config saved to /var/cache/conftool/dbconfig/20221205-215436-ladsgroup.json
* 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42314 and previous config saved to /var/cache/conftool/dbconfig/20221205-215415-ladsgroup.json
* 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42313 and previous config saved to /var/cache/conftool/dbconfig/20221205-214801-ladsgroup.json
* 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42312 and previous config saved to /var/cache/conftool/dbconfig/20221205-214740-ladsgroup.json
* 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42311 and previous config saved to /var/cache/conftool/dbconfig/20221205-214558-ladsgroup.json
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42310 and previous config saved to /var/cache/conftool/dbconfig/20221205-214333-ladsgroup.json
* 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42309 and previous config saved to /var/cache/conftool/dbconfig/20221205-214332-ladsgroup.json
* 21:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
* 21:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
* 21:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42308 and previous config saved to /var/cache/conftool/dbconfig/20221205-214255-ladsgroup.json
* 21:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42307 and previous config saved to /var/cache/conftool/dbconfig/20221205-214120-ladsgroup.json
* 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42306 and previous config saved to /var/cache/conftool/dbconfig/20221205-214058-ladsgroup.json
* 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42305 and previous config saved to /var/cache/conftool/dbconfig/20221205-213908-ladsgroup.json
* 21:33 TheresNoTime: close UTC late backport window
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42304 and previous config saved to /var/cache/conftool/dbconfig/20221205-213233-ladsgroup.json
* 21:31 samtar@deploy1002: Finished scap: Backport for [[gerrit:864724{{!}}Adjust to changes to redlink behavior from parsoid (T324352)]] (duration: 09m 05s)
* 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42303 and previous config saved to /var/cache/conftool/dbconfig/20221205-212748-ladsgroup.json
* 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42302 and previous config saved to /var/cache/conftool/dbconfig/20221205-212552-ladsgroup.json
* 21:24 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:864724{{!}}Adjust to changes to redlink behavior from parsoid (T324352)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42301 and previous config saved to /var/cache/conftool/dbconfig/20221205-212402-ladsgroup.json
* 21:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:22 samtar@deploy1002: Started scap: Backport for [[gerrit:864724{{!}}Adjust to changes to redlink behavior from parsoid (T324352)]]
* 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42300 and previous config saved to /var/cache/conftool/dbconfig/20221205-211727-ladsgroup.json
* 21:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:856552{{!}}Use new DiscussionTools heading markup on group0 wikis (T314714)]] (duration: 09m 55s)
* 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P42299 and previous config saved to /var/cache/conftool/dbconfig/20221205-211405-ladsgroup.json
* 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42298 and previous config saved to /var/cache/conftool/dbconfig/20221205-211242-ladsgroup.json
* 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42297 and previous config saved to /var/cache/conftool/dbconfig/20221205-211045-ladsgroup.json
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42296 and previous config saved to /var/cache/conftool/dbconfig/20221205-210855-ladsgroup.json
* 21:08 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:856552{{!}}Use new DiscussionTools heading markup on group0 wikis (T314714)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:07 samtar@deploy1002: Started scap: Backport for [[gerrit:856552{{!}}Use new DiscussionTools heading markup on group0 wikis (T314714)]]
* 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42295 and previous config saved to /var/cache/conftool/dbconfig/20221205-210220-ladsgroup.json
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42294 and previous config saved to /var/cache/conftool/dbconfig/20221205-205859-ladsgroup.json
* 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42293 and previous config saved to /var/cache/conftool/dbconfig/20221205-205735-ladsgroup.json
* 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42292 and previous config saved to /var/cache/conftool/dbconfig/20221205-205610-ladsgroup.json
* 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42291 and previous config saved to /var/cache/conftool/dbconfig/20221205-205547-ladsgroup.json
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42290 and previous config saved to /var/cache/conftool/dbconfig/20221205-205537-ladsgroup.json
* 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42289 and previous config saved to /var/cache/conftool/dbconfig/20221205-205324-ladsgroup.json
* 20:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 20:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42288 and previous config saved to /var/cache/conftool/dbconfig/20221205-205303-ladsgroup.json
* 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts phab1001.eqiad.wmnet
* 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
* 20:44 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42287 and previous config saved to /var/cache/conftool/dbconfig/20221205-204352-ladsgroup.json
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42286 and previous config saved to /var/cache/conftool/dbconfig/20221205-204034-ladsgroup.json
* 20:38 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42285 and previous config saved to /var/cache/conftool/dbconfig/20221205-203756-ladsgroup.json
* 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P42284 and previous config saved to /var/cache/conftool/dbconfig/20221205-202846-ladsgroup.json
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42283 and previous config saved to /var/cache/conftool/dbconfig/20221205-202528-ladsgroup.json
* 20:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts phab1001.eqiad.wmnet
* 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42282 and previous config saved to /var/cache/conftool/dbconfig/20221205-202250-ladsgroup.json
* 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42281 and previous config saved to /var/cache/conftool/dbconfig/20221205-202029-ladsgroup.json
* 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42280 and previous config saved to /var/cache/conftool/dbconfig/20221205-202008-ladsgroup.json
* 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42279 and previous config saved to /var/cache/conftool/dbconfig/20221205-201831-ladsgroup.json
* 20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42278 and previous config saved to /var/cache/conftool/dbconfig/20221205-201810-ladsgroup.json
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42277 and previous config saved to /var/cache/conftool/dbconfig/20221205-201021-ladsgroup.json
* 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42276 and previous config saved to /var/cache/conftool/dbconfig/20221205-200755-ladsgroup.json
* 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
* 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42275 and previous config saved to /var/cache/conftool/dbconfig/20221205-200743-ladsgroup.json
* 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
* 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42274 and previous config saved to /var/cache/conftool/dbconfig/20221205-200530-ladsgroup.json
* 20:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 20:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42273 and previous config saved to /var/cache/conftool/dbconfig/20221205-200501-ladsgroup.json
* 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42272 and previous config saved to /var/cache/conftool/dbconfig/20221205-200303-ladsgroup.json
* 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
* 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
* 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P42271 and previous config saved to /var/cache/conftool/dbconfig/20221205-195842-ladsgroup.json
* 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 19:57 mutante: phab1004 (prod) - removing phab1001 from firewall rules, rsync config {{!}} phab1001 (formerly prod) - removing prod role [[phab:T323418|T323418]] [[phab:T280597|T280597]]
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42270 and previous config saved to /var/cache/conftool/dbconfig/20221205-194955-ladsgroup.json
* 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42269 and previous config saved to /var/cache/conftool/dbconfig/20221205-194757-ladsgroup.json
* 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42268 and previous config saved to /var/cache/conftool/dbconfig/20221205-193949-ladsgroup.json
* 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42267 and previous config saved to /var/cache/conftool/dbconfig/20221205-193448-ladsgroup.json
* 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42266 and previous config saved to /var/cache/conftool/dbconfig/20221205-193250-ladsgroup.json
* 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P42265 and previous config saved to /var/cache/conftool/dbconfig/20221205-193203-ladsgroup.json
* 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42264 and previous config saved to /var/cache/conftool/dbconfig/20221205-192442-ladsgroup.json
* 19:24 mutante: phab1001, previous long time phabricator host, is about to be shut down, made a final copy of /srv/deployment, /root, /home, /etc and synced it to phab1004 - [[phab:T323418|T323418]]
* 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42263 and previous config saved to /var/cache/conftool/dbconfig/20221205-191656-ladsgroup.json
* 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42262 and previous config saved to /var/cache/conftool/dbconfig/20221205-190935-ladsgroup.json
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42261 and previous config saved to /var/cache/conftool/dbconfig/20221205-190710-ladsgroup.json
* 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42260 and previous config saved to /var/cache/conftool/dbconfig/20221205-190150-ladsgroup.json
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42259 and previous config saved to /var/cache/conftool/dbconfig/20221205-185429-ladsgroup.json
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42258 and previous config saved to /var/cache/conftool/dbconfig/20221205-185205-ladsgroup.json
* 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42257 and previous config saved to /var/cache/conftool/dbconfig/20221205-184950-ladsgroup.json
* 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42256 and previous config saved to /var/cache/conftool/dbconfig/20221205-184944-ladsgroup.json
* 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P42255 and previous config saved to /var/cache/conftool/dbconfig/20221205-184643-ladsgroup.json
* 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P42254 and previous config saved to /var/cache/conftool/dbconfig/20221205-183851-ladsgroup.json
* 18:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
* 18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42253 and previous config saved to /var/cache/conftool/dbconfig/20221205-183712-ladsgroup.json
* 18:37 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1033.eqiad.wmnet with OS bullseye
* 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42252 and previous config saved to /var/cache/conftool/dbconfig/20221205-183700-ladsgroup.json
* 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42251 and previous config saved to /var/cache/conftool/dbconfig/20221205-182155-ladsgroup.json
* 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5016.eqsin.wmnet
* 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:39 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:34 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5016.eqsin.wmnet
* 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
* 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
* 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=varnish-fe
* 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-be
* 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-tls
* 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
* 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-tls
* 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-be
* 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
* 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=ats-tls
* 17:28 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5024.eqsin.wmnet,service=ats-be
* 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1034.eqiad.wmnet with OS bullseye
* 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1035.eqiad.wmnet with OS bullseye
* 17:02 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
* 16:59 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
* 16:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
* 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5015.eqsin.wmnet
* 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 16:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 16:56 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
* 16:56 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
* 16:53 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
* 16:53 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 16:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5015.eqsin.wmnet
* 16:44 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1033.eqiad.wmnet with OS bullseye
* 16:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
* 16:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
* 16:41 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1034.eqiad.wmnet with OS bullseye
* 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=varnish-fe
* 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-be
* 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-tls
* 16:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1010.eqiad.wmnet with OS bullseye
* 16:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1035.eqiad.wmnet with OS bullseye
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-tls
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-be
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=ats-tls
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5027.eqsin.wmnet,service=ats-be
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-tls
* 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-be
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=ats-tls
* 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5023.eqsin.wmnet,service=ats-be
* 16:27 klausman: restarted kube-apiserver on ml-staging-ctrl2001 to adress high latency
* 16:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
* 16:11 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
* 16:06 klausman: restarted kube-apiserver on ml-serve-ctrl1001 to adress high latency and large number of 504s
* 16:06 moritzm: installing glibc security updates on buster
* 15:46 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1010.eqiad.wmnet with OS bullseye
* 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5012,5014].eqsin.wmnet
* 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 15:36 moritzm: installing apache2 security updates on buster
* 15:35 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5012,5014].eqsin.wmnet
* 15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
* 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=varnish-fe
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-be
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-tls
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=varnish-fe
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-be
* 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-tls
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-tls
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=ats-tls
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5026.eqsin.wmnet,service=ats-be
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-tls
* 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-be
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=ats-tls
* 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5022.eqsin.wmnet,service=ats-be
* 15:14 andrewbogott: deleted wikitech-static-ord-prebuster image backup in rackspace cloud. Here concludes the wikitech-static upgrade to Buster and php7.4
* 15:07 root@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:06 root@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:06 root@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:05 root@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5011,5013].eqsin.wmnet
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 14:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 14:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 14:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 14:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 14:54 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 14:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 14:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5011,5013].eqsin.wmnet
* 14:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
* 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=varnish-fe
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-be
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-tls
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=varnish-fe
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-be
* 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-tls
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 14:37 TheresNoTime: closing UTC afternoon backport window
* 14:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:863467{{!}}logos: icon could be not square]], [[gerrit:864766{{!}}trwiki: Add 20 years celebration logos (T324393)]] (duration: 08m 37s)
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-tls
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=ats-tls
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5025.eqsin.wmnet,service=ats-be
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-tls
* 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=ats-tls
* 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5021.eqsin.wmnet,service=ats-be
* 14:29 samtar@deploy1002: samtar and stang: Backport for [[gerrit:863467{{!}}logos: icon could be not square]], [[gerrit:864766{{!}}trwiki: Add 20 years celebration logos (T324393)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42249 and previous config saved to /var/cache/conftool/dbconfig/20221205-142752-marostegui.json
* 14:27 samtar@deploy1002: Started scap: Backport for [[gerrit:863467{{!}}logos: icon could be not square]], [[gerrit:864766{{!}}trwiki: Add 20 years celebration logos (T324393)]]
* 14:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:862247{{!}}Add Property (120) to Wikidata content Namespace (T321282)]] (duration: 16m 59s)
* 14:18 samtar@deploy1002: samtar and gtzatchkova: Backport for [[gerrit:862247{{!}}Add Property (120) to Wikidata content Namespace (T321282)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 14:09 samtar@deploy1002: Started scap: Backport for [[gerrit:862247{{!}}Add Property (120) to Wikidata content Namespace (T321282)]]
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2127 [[phab:T324180|T324180]]', diff saved to https://phabricator.wikimedia.org/P42247 and previous config saved to /var/cache/conftool/dbconfig/20221205-135932-ladsgroup.json
* 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 primary [[phab:T324180|T324180]]', diff saved to https://phabricator.wikimedia.org/P42246 and previous config saved to /var/cache/conftool/dbconfig/20221205-135539-ladsgroup.json
* 13:55 Amir1: Starting s3 codfw failover from db2127 to db2105 - [[phab:T324180|T324180]]
* 13:51 dcausse: repooling wdqs1004
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55818
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2105 with weight 0 [[phab:T324180|T324180]]', diff saved to https://phabricator.wikimedia.org/P42245 and previous config saved to /var/cache/conftool/dbconfig/20221205-134346-ladsgroup.json
* 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 [[phab:T324180|T324180]]
* 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 [[phab:T324180|T324180]]
* 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55818
* 13:31 TheresNoTime: [[phab:T302486|T302486]] : [samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --ns 828 --delete
* 13:24 moritzm: installing postgresql-common bugfix updates from Buster 10.13 point release
* 13:17 moritzm: installing distro-info-data bugfix updates from Buster 10.13 point release
* 13:12 moritzm: installing libnet-ssleay-perl bugfix updates from Buster 10.13 point release
* 12:50 moritzm: installing python-keystoneauth1 bugfix updates from Buster 10.13 point release
* 12:41 root@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 12:41 root@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 12:41 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:39 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
* 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
* 11:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
* 11:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
* 11:51 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 11:50 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42243 and previous config saved to /var/cache/conftool/dbconfig/20221205-113746-marostegui.json
* 11:31 moritzm: installing librsvg bugfix updates from buster point release
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42242 and previous config saved to /var/cache/conftool/dbconfig/20221205-111836-marostegui.json
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
* 11:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
* 11:07 hashar: Restarted Zuul to clear a stuck ssh connection with Gerrit - [[phab:T309376|T309376]]
* 10:33 kostajh: UTC morning deploys done
* 10:32 godog: contint1001 - racadm serveraction powercyle - crashed
* 10:31 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864713{{!}}User impact: Show discovery notice to mobile users (T323619)]] (duration: 09m 30s)
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42241 and previous config saved to /var/cache/conftool/dbconfig/20221205-103028-marostegui.json
* 10:23 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864713{{!}}User impact: Show discovery notice to mobile users (T323619)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 10:22 kharlan@deploy1002: Started scap: Backport for [[gerrit:864713{{!}}User impact: Show discovery notice to mobile users (T323619)]]
* 10:14 Emperor: rebalance thanos rings [[phab:T311690|T311690]]
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42240 and previous config saved to /var/cache/conftool/dbconfig/20221205-100607-marostegui.json
* 10:05 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864712{{!}}User impact: Show discovery tour to desktop users who had old module (T323619)]] (duration: 27m 33s)
* 09:50 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864712{{!}}User impact: Show discovery tour to desktop users who had old module (T323619)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 09:39 moritzm: restarting mediawiki canaries to pick up freetype security updates
* 09:38 godog: force a puppet run on physical hosts to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/860572
* 09:37 kharlan@deploy1002: Started scap: Backport for [[gerrit:864712{{!}}User impact: Show discovery tour to desktop users who had old module (T323619)]]
* 09:36 moritzm: installing freetype security updates
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42239 and previous config saved to /var/cache/conftool/dbconfig/20221205-091547-marostegui.json
* 09:15 kharlan@deploy1002: backport aborted:  (duration: 00m 25s)
* 09:14 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864666{{!}}Fix ExpensiveUserImpact input validation (T324312)]] (duration: 09m 10s)
* 09:06 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864666{{!}}Fix ExpensiveUserImpact input validation (T324312)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 09:05 kharlan@deploy1002: Started scap: Backport for [[gerrit:864666{{!}}Fix ExpensiveUserImpact input validation (T324312)]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42238 and previous config saved to /var/cache/conftool/dbconfig/20221205-090214-marostegui.json
* 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59689
* 09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59689
* 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58308
* 08:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58308
* 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141731
* 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141731
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52580
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52580
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42237 and previous config saved to /var/cache/conftool/dbconfig/20221205-085235-marostegui.json
* 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136907
* 08:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136907
* 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55818
* 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55818
* 08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38623
* 08:48 kharlan@deploy1002: Finished scap: Backport for [[gerrit:859991{{!}}GrowthExperiments: End imagerecommendation experiment (T323686)]] (duration: 09m 26s)
* 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38623
* 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4788
* 08:40 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:859991{{!}}GrowthExperiments: End imagerecommendation experiment (T323686)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 08:38 kharlan@deploy1002: Started scap: Backport for [[gerrit:859991{{!}}GrowthExperiments: End imagerecommendation experiment (T323686)]]
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4788
* 08:35 kartik@deploy1002: Finished scap: Backport for [[gerrit:863097{{!}}Enable Section Translation on 8 Wikipedias (T319176)]] (duration: 09m 57s)
* 08:29 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web,name=eqiad
* 08:27 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:863097{{!}}Enable Section Translation on 8 Wikipedias (T319176)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:25 kartik@deploy1002: Started scap: Backport for [[gerrit:863097{{!}}Enable Section Translation on 8 Wikipedias (T319176)]]
* 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
* 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42236 and previous config saved to /var/cache/conftool/dbconfig/20221205-082320-marostegui.json
* 08:22 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-web,name=eqiad
* 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2002.codfw.wmnet,service=thanos-web
* 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2003.codfw.wmnet,service=thanos-web
* 08:20 kartik@deploy1002: Finished scap: Backport for [[gerrit:862412{{!}}testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177)]] (duration: 17m 25s)
* 08:11 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:862412{{!}}testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:05 dcausse: restarting blazegraph on wdqs1004 (stuck with 2000+ threads, [[phab:T242453|T242453]])
* 08:02 kartik@deploy1002: Started scap: Backport for [[gerrit:862412{{!}}testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177)]]
* 07:57 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
* 07:56 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1003.eqiad.wmnet,service=thanos-web
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P42234 and previous config saved to /var/cache/conftool/dbconfig/20221205-074804-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 100%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42233 and previous config saved to /var/cache/conftool/dbconfig/20221205-074655-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P42232 and previous config saved to /var/cache/conftool/dbconfig/20221205-073259-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 75%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42231 and previous config saved to /var/cache/conftool/dbconfig/20221205-073150-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P42230 and previous config saved to /var/cache/conftool/dbconfig/20221205-071754-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 50%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42229 and previous config saved to /var/cache/conftool/dbconfig/20221205-071645-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P42228 and previous config saved to /var/cache/conftool/dbconfig/20221205-070250-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 25%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42227 and previous config saved to /var/cache/conftool/dbconfig/20221205-070140-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42226 and previous config saved to /var/cache/conftool/dbconfig/20221205-065151-marostegui.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P42225 and previous config saved to /var/cache/conftool/dbconfig/20221205-064745-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 10%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42224 and previous config saved to /var/cache/conftool/dbconfig/20221205-064635-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42223 and previous config saved to /var/cache/conftool/dbconfig/20221205-063743-marostegui.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P42222 and previous config saved to /var/cache/conftool/dbconfig/20221205-063240-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 5%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42221 and previous config saved to /var/cache/conftool/dbconfig/20221205-063130-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to dbctl (depooled)', diff saved to https://phabricator.wikimedia.org/P42220 and previous config saved to /var/cache/conftool/dbconfig/20221205-063020-marostegui.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P42219 and previous config saved to /var/cache/conftool/dbconfig/20221205-061735-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 1%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42218 and previous config saved to /var/cache/conftool/dbconfig/20221205-061625-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P42217 and previous config saved to /var/cache/conftool/dbconfig/20221205-061616-marostegui.json


== 2020-05-14 ==
== 2022-12-04 ==
* 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Revert temporary 20k logo for vecwiki ([[phab:T252770|T252770]]) (duration: 01m 06s)
* 04:19 TheresNoTime: [[phab:T302486|T302486]] : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`
* 23:23 RoanKattouw: Ran namespaceDupes.php for [[phab:T252343|T252343]]
* 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create Gapura (Portal) namespace on jvwiki ([[phab:T252343|T252343]]) (duration: 01m 06s)
* 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.ub.uni-heidelberg.de and hq.eso.org to $wgCopyUploadDomains ([[phab:T252600|T252600]], [[phab:T252726|T252726]]) (duration: 01m 07s)
* 21:43 ryankemper: depooled wdqs2006 while lag recovers
* 21:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:16 volans: moved codereview.tar.gz and with_r.tar.gz from miscweb1002 to cumin1001 to free space
* 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: Allow plain text labels in side bar - [[phab:T252727|T252727]] (duration: 01m 06s)
* 19:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:49 ryankemper: Depooled wqds1006 in preparation for impending wdqs data xfer
* 18:36 Urbanecm: Morning SWAT done
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|15adbbc}}: [thwikisource] Set ProofReadPage separator to an empty string ([[phab:T252610|T252610]]) (duration: 01m 06s)
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b8399c}}: Undeploy graphoid from mediawikiwiki ([[phab:T242855|T242855]]) (duration: 01m 05s)
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|f03a45c}}: Adding import to test wikis from mediawikiwiki ([[phab:T242855|T242855]]) (duration: 01m 07s)
* 17:03 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 1 member 1 - [[phab:T252797|T252797]]
* 16:55 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 3 member 1 - [[phab:T252797|T252797]]
* 16:51 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 port 48 member 2 - [[phab:T252797|T252797]]
* 16:50 XioNoX: request virtual-chassis vc-port set pic-slot 1 port 2 member 1 - [[phab:T252797|T252797]]
* 16:42 XioNoX: request virtual-chassis vc-port delete pic-slot 1 port 2 member 1 - [[phab:T252797|T252797]]
* 16:36 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 port 48 member 2 - [[phab:T252797|T252797]]
* 15:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:25 XioNoX: disable asw2-d1-eqiad:et-1/1/0 - [[phab:T251663|T251663]]
* 14:39 mutante: kuai kuai is https://twitter.com/Arlieth/status/1257714333133357056 {{!}} https://en.wikipedia.org/wiki/Kuai_Kuai_culture
* 13:31 _joe_: updating purged to 0.11 in eqiad,eqsin,esams
* 12:47 vgutierrez: rolling upgrade ats to version 8.0.7-1wm7
* 12:46 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 12:43 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 12:22 kormat: reverted iosched on pc1010 to `mq-deadline` [[phab:T252761|T252761]]
* 11:47 kormat: changed iosched on pc1010 to `none` as a test [[phab:T252761|T252761]]
* 11:07 matthiasmullie: EU swat done
* 11:05 mlitn@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/WikibaseMediaInfo/: [MediaInfo] Enable media search for all users by default (duration: 01m 12s)
* 11:04 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp3064
* 10:31 fdans@deploy1001: Finished deploy [analytics/refinery@6f13979]: Regular analytics weekly train (duration: 17m 14s)
* 10:14 fdans@deploy1001: Started deploy [analytics/refinery@6f13979]: Regular analytics weekly train
* 09:58 elukey: remove matomo 3.11 from the main component of stretch-wikimedia
* 09:56 elukey: upgrade matomo on matomo1001 to 3.13.3 (latest upstream) - [[phab:T252741|T252741]]
* 09:30 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:29 elukey: upload matomo-3.13.3 to thirdparty/matomo on stretch{{!}}buster-wikimedia
* 09:22 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 08:57 elukey: imported gpg key 1FD752571FE36FF23F78F91B81E2E78B66FED89E in apt1001 (Matomo public debian repo)
* 08:56 moritzm: installing Java security updates on Presto
* 08:43 jayme: updated helm: 2.12.2-1 -> 2.16.7-1 on deploy[1,2]001 and contint1001. 2.12.2-4 -> 2.16.7-1 on contint2001
* 08:39 jayme: imported helm 2.16.7-1 to main for jessie-wikimedia
* 08:32 moritzm: installing Java security updates on Hadoop/AQS/Druid
* 08:20 jayme@deploy2001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 08:00 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp5011
* 07:03 moritzm: installing apt security updates
* 06:33 ryankemper: Pooled wdqs2005 following successful test queries
* 04:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:02 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:59 ryankemper: wdqs1005 has been de-pooled pending wdqs data xfer
* 02:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 02:57 ryankemper: wdqs1004 was repooled after successful test queries
* 02:55 ryankemper: wdqs2006 was repooled after successful test queries
* 01:32 ryankemper: depooled wdqs2006 while waiting for lag to recover
* 00:54 foks: change password for "Python eggs"
* 00:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:31 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:08 twentyafterfour: phabricator update appears to be stable.
* 00:05 twentyafterfour: updating phabricator. 1 patch + new translations. Expect only brief downtime.


== 2020-05-13 ==
== 2022-12-03 ==
* 23:46 cstone: SmashPig revision changed from {{Gerrit|cd1a49da5f}} to {{Gerrit|2702b04329}}
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 23:43 ejegg: updated payments-wiki from {{Gerrit|dabba1804c}} to {{Gerrit|3c465cb11c}}
* 23:36 ejegg: rolled back payments-wiki to {{Gerrit|dabba1804c}}
* 23:29 ejegg: updated payment-wiki from {{Gerrit|dabba1804c}} to {{Gerrit|3c465cb11c}}
* 22:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:39 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper: Depooled wdqs1004 for subsequent wdqs data xfer
* 22:29 ryankemper: Pooled wdqs2005 given that lag has returned to normal levels and the instance is responding to queries correctly
* 22:26 ryankemper: Pooled wdqs1008 given that lag has returned to normal levels and the instance is responding to queries correctly
* 21:30 elukey: powercycle analytics1055
* 21:05 eileen: civicrm revision changed from {{Gerrit|cfb6101e39}} to {{Gerrit|ed4c9522ac}}, config revision is {{Gerrit|2eb75f8dff}}
* 20:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242430|T242430]] Stop loading the ParsoidBatchAPI extension (duration: 01m 08s)
* 19:09 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32 (duration: 01m 05s)
* 19:08 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32
* 18:54 twentyafterfour: restarted php-fpm on phab1001
* 18:53 thcipriani: restarting gerrit
* 18:52 twentyafterfour: restarting apache on phab1001 for lack of a better idea
* 18:50 herron: restarted kafka broker on kafka-main1001 for java security updates
* 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|38db3e0}}: Update production wordmarks ([[phab:T252143|T252143]]) (duration: 01m 07s)
* 18:17 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|38db3e0}}: Update production wordmarks ([[phab:T252143|T252143]]) (duration: 01m 09s)
* 17:55 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:24 ryankemper: Manually depooled wdqs2005 while lag catches up following the data xfer
* 17:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:18 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:12 urandom: restarted cassandra-c, restbase2017
* 17:04 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:11 James_F: Running AbuseFilter updateVarDumps on group0 on mwmaint1002 [[phab:T246539|T246539]]
* 16:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp4032
* 15:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:30 jayme: imported scap 3.14.0-1 to main for buster-wikimedia
* 15:30 jayme: imported scap 3.14.0-1 to main for jessie-wikimedia
* 15:29 ryankemper: Manually de-pooling `wdqs1008.eqiad.wmnet` in preparation for wdqs data transfer
* 15:29 jayme: imported scap 3.14.0-1 to main for stretch-wikimedia
* 15:26 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:55 _joe_: upgrading + restarting purged across ulsfo and codfw [[phab:T133821|T133821]]
* 14:50 filippo@deploy1001: Finished deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 [[phab:T251222|T251222]] (duration: 00m 10s)
* 14:50 filippo@deploy1001: Started deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 [[phab:T251222|T251222]]
* 14:35 vgutierrez: upload trafficserver 8.0.7-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]] [[phab:T251537|T251537]]
* 13:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:55 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:595881{{!}}Add *.deutsche-digitale-bibliothek.de to the wgCopyUploadsDomains (T252296)]] (duration: 01m 06s)
* 11:17 Amir1: EU SWAT is done
* 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:596180{{!}}Disable wgLegacyJavaScriptGlobals on fawiki and wikidatawiki (T72470)]] (duration: 01m 06s)
* 11:09 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:06 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:595544{{!}}Anchor RegExp for Data Bridge in Beta (BETA-ONLY)]] (duration: 01m 06s)
* 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 10:55 volans: imported tqdm 4.11.2-1 packages into buster-wikimedia component/spicerack
* 10:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:09 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master [[phab:T252182|T252182]] (duration: 01m 05s)
* 09:55 jbond42: deployed a fix to ferm-status script.  unmanaged ferm rules may get removed
* 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:37 marostegui: Upgrade db2102 to the new 10.4.13 - [[phab:T250666|T250666]]
* 09:32 _joe_: installing purged 0.11 on cp2027 [[phab:T133821|T133821]]
* 09:21 _joe_: installing purged 0.11 on cp2028 [[phab:T133821|T133821]]
* 09:11 moritzm: re-enabling puppet
* 09:08 mutante: rsyncing /home dirs from people.wikimedia.org to new backend people1002
* 09:00 moritzm: disabling puppet temporarily
* 08:53 _joe_: uploaded purged 0.11
* 08:52 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 as pc1 master [[phab:T252182|T252182]] (duration: 01m 17s)
* 07:42 jayme: imported helm 2.16.7-1 to main for stretch-wikimedia
* 07:41 jayme: imported helm 2.16.7-1 to main for buster-wikimedia
* 07:29 godog: roll-restart logstash in codfw/eqiad for configuration change
* 07:14 elukey: upload spark2_2.4.4-bin-hadoop2.6-2 for buster/stretch on apt1001
* 05:33 ryankemper: wdqs2004 was depooled ~3 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
* 05:32 ryankemper: wdqs1003 was depooled ~6 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
* 05:27 _joe_: restarting php-fpm on mw1374, children dying with SIGILL
* 05:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 05:11 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 05:10 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 04:52 kart_: Updated cxserver to 2020-05-11-082207-production ([[phab:T250004|T250004]])
* 04:47 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:44 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:42 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:33 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer


== 2020-05-12 ==
== 2022-12-02 ==
* 23:09 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/includes/revisionlist/RevisionItemBase.php: Fix RevisionItemBase::getId to actually return an int, as intended - [[phab:T252076|T252076]] (duration: 01m 06s)
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 19:55 dpifke@deploy1001: Finished deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - [[phab:T238086|T238086]] (duration: 00m 05s)
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:55 dpifke@deploy1001: Started deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - [[phab:T238086|T238086]]
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.32
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:41 legoktm: started codereview-archiver script in screen on mwmaint1002
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:11 sukhe: restart pybal on lvs5004
* 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:49 bblack: 'gdnsdctl replace' on all authdns to load new maxmind data
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 17:43 bblack: updating maxmind database on puppetmasters (usually automated weekly; we're mid-cycle)
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 17:10 James_F: Running AbuseFilter updateVarDumps on testwikis on mwmaint1002 [[phab:T246539|T246539]]
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 16:55 James_F: Running AbuseFilter updateVarDumps on closed wikis on mwmaint1002 [[phab:T246539|T246539]]
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 16:55 mstyles@deploy1001: Finished deploy [wdqs/wdqs@f617307]: v0.3.31 (duration: 14m 53s)
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 16:40 mstyles@deploy1001: Started deploy [wdqs/wdqs@f617307]: v0.3.31
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 16:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:34 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 15:15 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 15:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:14 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 15:13 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:13 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 15:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 15:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 15:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:04 moritzm: installing 4.9.118 Linux updates on Buster nodes (reboots happening later)
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:02 moritzm: upgrading contint2001 to openjdk-8 u252
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:01 godog: bounce pybal on lvs2010 and lvs2009 - [[phab:T252186|T252186]]
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:40 moritzm: imported openjdk-8 u252 forward port for buster-wikimedia component/jdk8
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 14:40 ema: rolling thumbor upgrade to 2.8-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:39 andrewbogott: rebuilding cloudcontrol1003 and 1004
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 14:38 hashar: 1.35.0-wmf.22 is on test wikis. Will be pushed to group0 later today during the american window (19:00 - 21:00 UTC) # [[phab:T249964|T249964]]
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:34 ema: thumbor2001: repool
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:33 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - Test everywhere, SearchSatisfaction on testwiki only - [[phab:T249261|T249261]] (duration: 01m 06s)
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:33 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.8-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:23 moritzm: installing Java security updates on WDQS hosts
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:20 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.32 (duration: 72m 04s)
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 14:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:00 ema: thumbor2001: depool due to minor bug in 2.7-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 13:54 ema: thumbor2001: pool thumbor 2.7-1+deb10u1 for prod traffic [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 13:50 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.7-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 13:42 jbond42: disable puppet on all CP hosts to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/583342
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 13:36 kormat: reimaging pc2007 to buster [[phab:T252182|T252182]]
* 12:09 jynus: dropping all databases from db1133
* 13:36 moritzm: rebooting netflow* hosts for kernel update
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:33 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm5 - [[phab:T249335|T249335]]
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:31 moritzm: rebooting deneb for kernel update
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 13:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 13:08 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.32
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 13:05 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.28 (duration: 23m 47s)
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 12:37 moritzm: installing iputils update from Buster point release
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 12:08 hashar: Cutting branch 1.35.0-wmf.32 # [[phab:T249964|T249964]]
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 12:08 gehel: restart blazegraph + updater on wdqs2002 - JVM upgrade
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 11:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 11:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 11:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:55 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp5011 - [[phab:T249335|T249335]]
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 10:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 10:43 kormat: reimaging pc2010 to buster [[phab:T252182|T252182]]
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 10:30 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp4032 - [[phab:T249335|T249335]]
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 10:30 ema: rolling thumbor upgrade to 2.6-1+deb10u1 [[phab:T226707|T226707]]
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:19 ema: repool thumbor2001 with upgraded python-thumbor-wikimedia
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:13 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.6-1+deb10u1
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:04 godog: update compiler facts
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:34 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:29 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: cluster=thanos
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 09:07 moritzm: rebooting contint2001 for kernel update
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 09:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 07:46 godog: reboot thanos hosts for kernel upgrade
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 07:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 07:41 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:12 moritzm: rebooting the IDP hosts, SSO sessions will need to be renewed
* 07:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:56 vgutierrez: upload trafficserver 8.0.7-1wm4 to apt.wm.o (buster) - [[phab:T242767|T242767]] [[phab:T249335|T249335]]
* 05:29 marostegui: Restart docker-report-releng on deneb
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only=off for maintenance [[phab:T251502|T251502]]', diff saved to https://phabricator.wikimedia.org/P11180 and previous config saved to /var/cache/conftool/dbconfig/20200512-050339-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T251502|T251502]]', diff saved to https://phabricator.wikimedia.org/P11179 and previous config saved to /var/cache/conftool/dbconfig/20200512-050054-marostegui.json
* 04:46 marostegui: Stop mysql on labsdb1011 to transfer its content - [[phab:T249188|T249188]]
* 02:14 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:45 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-05-11 ==
== 2022-12-01 ==
* 21:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 21:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 19:08 cmjohnson@cumin1001: END (PASS)
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2020-05-10 ==
==Archives ==
* 12:18 marostegui: Start event scheduler on db1115 after a massive delete - [[phab:T252324|T252324]]
See [[Server Admin Log/Archives]].
* 11:05 marostegui: Stop event scheduler on db1115 to perform a massive delete - [[phab:T252324|T252324]]
* 10:27 dcausse: restarting blazgraph on wdqs1004: [[phab:T242453|T242453]]
* 09:56 marostegui: Change scaling_governor from powersave to performance on db1115 - [[phab:T252324|T252324]]
* 09:25 marostegui: Stop MySQL and restart db1115 - [[phab:T252324|T252324]]
* 08:50 marostegui: Restart mysql on db1115 to change buffer pool size from 20GB to 40GB [[phab:T252324|T252324]] (
* 08:44 elukey: Power cycle analytics1052 after eno1 issue
* 08:01 marostegui: Disable unused events like %_schema [[phab:T252324|T252324]]  [[phab:T231185|T231185]]
* 07:11 marostegui: Restart mysql on db1115 [[phab:T231185|T231185]]
* 07:11 marostegui: Truncate tendril. processlist_query_log [[phab:T231185|T231185]]
 
== 2020-05-08 ==
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 [[phab:T251598|T251598]]
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 [[phab:T251598|T251598]]
* 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 [[phab:T251598|T251598]]
* 21:06 ottomata: running prefered replica election for kafka-jumbo  to get preferred leaders back after reboot of broker earlier today - [[phab:T252203|T252203]]
* 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for [[phab:T252121|T252121]]
* 17:59 marostegui: Extend /srv by 500G on labsdb1011 [[phab:T249188|T249188]]
* 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - [[phab:T252203|T252203]]
* 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - [[phab:T252203|T252203]]
* 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
* 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
* 14:05 akosiaris: [[phab:T243106|T243106]] undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - [[phab:T249335|T249335]]
* 13:20 akosiaris: [[phab:T243106|T243106]] redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:16 akosiaris: [[phab:T243106|T243106]] undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
* 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
* 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
* 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
* 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
* 08:20 vgutierrez: rolling restart of ats-tls on esams - [[phab:T249335|T249335]]
* 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - [[phab:T249335|T249335]]
* 07:07 mutante: phabricator rmdir /var/run/phd/pid  - empty and now unused
* 07:01 moritzm: installing php5 security updates
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:10 marostegui: Upgrade pc1010
* 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for [[phab:T252179|T252179]]
* 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for [[phab:T252179|T252179]]
 
== 2020-05-07 ==
* 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: [[gerrit:595054{{!}}Handle RevisionAccessException with try-catch (T252156)]] (duration: 01m 08s)
* 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - [[phab:T238230|T238230]] (duration: 01m 07s)
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
* 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
* 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - [[phab:T252010|T252010]]
* 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]] (duration: 01m 17s)
* 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]]
* 18:15 Urbanecm: Morning SWAT done
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 2/2) (duration: 01m 06s)
* 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 1/2) (duration: 01m 08s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|54bd2f1}}: Add the investigate right to the checkuser group on testwiki ([[phab:T251932|T251932]]) (duration: 01m 08s)
* 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
* 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
* 17:18 ejegg: updated payments-wiki from {{Gerrit|afb84cc391}} to {{Gerrit|dabba1804c}}
* 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
* 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
* 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
* 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
* 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
* 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: [[phab:T251460|T251460]] Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
* 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:27 vgutierrez: rolling restart of ats-tls on text@esams - [[phab:T249335|T249335]]
* 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
* 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
* 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
* 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert [[phab:T172489|T172489]]
* 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
* 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
* 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
* 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
* 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] [[phab:T252079|T252079]] Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
* 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
* 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
* 11:10 matthiasmullie: EU swat done
* 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
* 10:07 moritzm: installing Java security updates on restbase/sessionstore
* 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
* 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
* 08:06 jynus: setting pc2007, pc2009 as read-write
* 07:44 godog: further decrease weight for ms-be10[678] - [[phab:T252008|T252008]]
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
* 05:22 marostegui: Reimage db2078
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
* 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
 
== 2020-05-06 ==
* 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias ([[phab:T245791|T245791]]) (duration: 01m 07s)
* 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: [[gerrit:594803{{!}}RevisionItem: Fix providing timestamp in getRevisionLink ]] (duration: 01m 09s)
* 21:45 andrewbogott: updating puppet compiler facts
* 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:35 ejegg: updated Fundraising CiviCRM from {{Gerrit|b15b2cfbb5}} to {{Gerrit|cfb6101e39}}
* 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
* 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
* 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group1
* 19:03 brennen: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group0
* 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN [[phab:T252052|T252052]] (duration: 01m 09s)
* 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
* 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
* 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
* 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes [[phab:T252043|T252043]] (duration: 01m 08s)
* 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN ([[phab:T243056|T243056]])
* 13:32 hashar: Restarting CI Jenkins
* 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
* 13:27 moritzm: installing graphicsmagick security updates
* 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - [[phab:T252010|T252010]]
* 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - [[phab:T252010|T252010]]
* 13:19 ema: cp: upgrade purged to v0.10
* 13:08 godog: start swift decom ms-be101[678] - [[phab:T252008|T252008]]
* 11:22 kart_: EU SWAT done.
* 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}594668{{!}}Enable ContentTranslation in Armenian WP as a default tool (T249229)]] (duration: 01m 08s)
* 10:27 ema: cp2027: test purged v0.10
* 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
* 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 09:52 jbond42: enable rember me feature of CAS
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
* 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - [[phab:T251158|T251158]]
* 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
* 08:53 jynus: kill FTWRL on db2101
* 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 [[phab:T99740|T99740]] (duration: 01m 16s)
* 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic ([[phab:T99740|T99740]])
* 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
* 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
* 06:00 elukey: powercycle analytics1060 - host stuck - [[phab:T251973|T251973]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out [[phab:T250055|T250055]]', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
* 05:02 marostegui: Deploy schema change on db1121
 
== 2020-05-05 ==
* 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki ([[phab:T249643|T249643]]) (duration: 01m 06s)
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
* 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
* 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
* 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] take 2 (duration: 01m 06s)
* 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] (duration: 01m 05s)
* 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: [[phab:T251950|T251950]] (duration: 01m 06s)
* 20:02 herron: added ryankemper to wmf and ops ldap groups [[phab:T251572|T251572]]
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
* 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
* 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
* 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
* 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% ([[phab:T249963|T249963]], [[phab:T223287|T223287]])
* 18:39 cdanis: depool mw2221 for some manual testing
* 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
* 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
* 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
* 16:48 brennen: 1.35.0-wmf.31 was branched at {{Gerrit|4d3fed31a435e7bd24925a154f89a9407670986d}} for [[phab:T249963|T249963]]
* 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
* 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) at 16:30 UTC
* 15:47 cstone: SmashPig revision changed from {{Gerrit|8c30ed7fe5}} to {{Gerrit|cd1a49da5f}}
* 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
* 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
* 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
* 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. [[phab:T219921|T219921]]
* 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # [[phab:T97513|T97513]]
* 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
* 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
* 12:37 XioNoX: push pfw policy - [[phab:T251769|T251769]]
* 12:07 jbond42: updating cas login page
* 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
* 11:47 moritzm: rolling restart of apache on kibana hosts
* 11:41 mutante: LDAP - added eamedia to wmf group ([[phab:T251358|T251358]])
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
* 11:30 marostegui: Drop [[phab:T248086|T248086]]_wb_terms table on labsdb hosts - [[phab:T248086|T248086]]
* 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
* 11:22 kart_: EU SWAT done.
* 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}592479{{!}}Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383)]] (duration: 01m 01s)
* 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
* 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251660|T251660]])
* 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251575|T251575]])
* 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:36 moritzm: removing boron.eqiad.wmnet
* 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 09:03 gehel: restarting wdqs updater on all servers
* 08:53 moritzm: installing Java security updates on releases*
* 08:44 kormat: reimaging es1024 to buster [[phab:T250666|T250666]]
* 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 08:26 moritzm: upgrading slapd on serpens/seaborgium
* 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 08:08 moritzm: installing Java security updates on notebook/stat hosts
* 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
* 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
* 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
* 06:59 addshore: depool wdqs1006 heavy lag
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
* 05:19 marostegui: Start s5 and s6 maintenance - [[phab:T251154|T251154]]
* 04:39 marostegui: Restart mysql on tendril host: db1115 - [[phab:T231769|T231769]]
 
== 2020-05-04 ==
* 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
* 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
* 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
* 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
* 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
* 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
* 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T251835|T251835]]: Restore {{Gerrit|dc752af1e94684faacbe9662789815c6edbbdf46}} (duration: 00m 57s)
* 22:16 eileen: process-control config revision is {{Gerrit|2eb75f8dff}}
* 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]] (duration: 00m 05s)
* 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]]
* 18:16 Urbanecm: Morning SWAT done
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c04fbdd}}: Adding upload_by_url user right to all registered users on Commons ([[phab:T251474|T251474]]) (duration: 00m 57s)
* 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: {{Gerrit|b85fc16}}: Enable on all ExtraSignaturesNamespaces ([[phab:T249036|T249036]]) (duration: 01m 00s)
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|18c1efb}}: Load DiscussionTools on en.wiki ([[phab:T249376|T249376]]) (duration: 00m 58s)
* 17:57 XioNoX: configure singtel interface on cr1-eqsin
* 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
* 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}] (duration: 00m 09s)
* 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}]
* 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}] (duration: 16m 45s)
* 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}]
* 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
* 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
* 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
* 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
* 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: [[phab:T251457|T251457]] rdbms: don't treat lock() as a write operation (duration: 01m 04s)
* 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: [[phab:T250393|T250393]] Follow-up {{Gerrit|I07dd6f7}}: Fix font size in diff (duration: 01m 05s)
* 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
* 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
* 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}] (duration: 00m 10s)
* 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}]
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}] (duration: 15m 07s)