You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(cdanis: ❌cdanis@ores2001.codfw.wmnet ~ πŸ•€πŸΊ sudo systemctl restart uwsgi-ores.service)
imported>Stashbot
(eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster)
Β 
(725 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-09-26 ==
== 2022-12-09 ==
* 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ πŸ•€πŸΊ sudo systemctl restart uwsgi-ores.service
* 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster
* 01:11 cdanis: βœ”οΈ cdanis@ores2001.codfw.wmnet ~ πŸ•˜πŸΊ sudo systemctl restart celery-ores-worker.service
* 01:07 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2003
* 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 01:06 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2003
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 01:06 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2002
* 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 01:05 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2002
* 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
* 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 01:04 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 01:01 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2003.codfw.wmnet
* 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:45 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:43 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:39 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2003.codfw.wmnet
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2002.codfw.wmnet
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:37 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 00:34 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 00:30 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2002.codfw.wmnet


== 2020-09-25 ==
== 2022-12-08 ==
* 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
* 23:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]] (duration: 08m 42s)
* 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
* 23:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
* 23:23 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:866470{{!}}Make wikibase.client.init module target mobile (T235712)]]
* food: updated fundraising CiviCRM from {{Gerrit|eb90dbcfd3}} to {{Gerrit|035ad1c351}}
* 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS buster
* 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
* 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
* 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
* 23:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
* 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
* 23:11 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]] (duration: 09m 12s)
* 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
* 23:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
* 23:02 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:866469{{!}}File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)]]
* 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
* 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
* 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: [[gerrit:630065{{!}}Make all section `collapsible` during server side rendering (T263832)]] (duration: 00m 59s)
* 22:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
* 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
* 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS buster
* 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
* 22:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
* 22:28 TheresNoTime: close UTC late backport and config training (+28m)
* 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
* 22:27 samtar@deploy1002: Finished scap: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]] (duration: 09m 57s)
* 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
* 22:19 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:17 samtar@deploy1002: Started scap: Backport for [[gerrit:866502{{!}}Start mobile DiscussionTools A/B test (T321961)]]
* 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 22:16 samtar@deploy1002: Finished scap: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]] (duration: 10m 06s)
* 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide [[phab:T263842|T263842]]
* 22:08 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
* 22:06 samtar@deploy1002: Started scap: Backport for [[gerrit:866467{{!}}Deemphasize "Learn more about this page" link (T324702)]], [[gerrit:866468{{!}}Reinitialize edit links after page content is reloaded (T324686)]]
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:05 samtar@deploy1002: Finished scap: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]] (duration: 11m 04s)
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 22:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:56 samtar@deploy1002: samtar and stang: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 21:54 samtar@deploy1002: Started scap: Backport for [[gerrit:866339{{!}}frwikiversity: Set wgRestrictDisplayTitle to false (T324277)]], [[gerrit:866432{{!}}extwiki: Add new logo (T318766)]]
* 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups [[phab:T263842|T263842]]', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
* 21:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]] (duration: 09m 16s)
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:46 samtar@deploy1002: samtar and stang: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 21:44 samtar@deploy1002: Started scap: Backport for [[gerrit:865763{{!}}specieswiki: Install GeoData extension (T324348)]]
* 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 21:39 TheresNoTime: [[phab:T324348|T324348]] : `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php specieswiki geodata`
* 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:37 samtar@deploy1002: Finished scap: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]] (duration: 08m 01s)
* 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 21:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
* 09:58 moritzm: restarting archiva to pick up Java security update
* 21:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:31 samtar@deploy1002: samtar and stang: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 21:29 samtar@deploy1002: Started scap: Backport for [[gerrit:865748{{!}}createExtensionTables: Add extension GeoData (T324348)]]
* 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 21:27 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]] (duration: 08m 31s)
* 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 21:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing [[phab:T261130|T261130]])
* 21:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
* 21:21 jdrewniak@deploy1002: jdrewniak and jdrewniak: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:34 elukey: reboot stat1004 to pick up kernel settings
* 21:19 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:866505{{!}}Add elwiki and arwiki to desktop-improvements group (T322391)]]
* 03:10 ejegg: updated payments-wiki from {{Gerrit|f89c594e12}} to {{Gerrit|b2eb456ed1}}
* 21:17 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:866501{{!}} Bumping portals to master (T128546)]] (duration: 06m 55s)
* 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]] (duration: 09m 05s)
* 21:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
* 21:10 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:866501{{!}} Bumping portals to master (T128546)]] (duration: 07m 07s)
* 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]]
* 20:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]] (duration: 06m 09s)
* 20:34 ryankemper: [Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005
* 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]]
* 20:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 20:27 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:27 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 20:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 20:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for [[phab:T322776|T322776]]
* 20:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for [[phab:T322776|T322776]]
* 20:17 ryankemper: [[phab:T323064|T323064]] Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1
* 20:12 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13Β  refs [[phab:T320518|T320518]]
* 20:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 19:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 19:59 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - [[phab:T322776|T322776]]
* 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 16:14 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001
* 16:14 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001
* 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
* 16:12 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
* 16:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 16:08 eevans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 16:08 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye
* 15:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
* 15:48 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
* 15:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
* 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
* 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 15:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
* 15:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
* 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye
* 15:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 15:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 15:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
* 15:12 hashar: Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # [[phab:T323754|T323754]]
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet
* 15:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
* 15:09 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 15:08 jiji@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 15:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 15:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 15:07 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 15:05 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2001.codfw.wmnet
* 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42650 and previous config saved to /var/cache/conftool/dbconfig/20221208-150109-ladsgroup.json
* 14:59 hashar: Restarting Gerrit replica TWICE on gerrit2002.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # [[phab:T323754|T323754]]
* 14:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:50 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:50 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:47 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42649 and previous config saved to /var/cache/conftool/dbconfig/20221208-144602-ladsgroup.json
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42648 and previous config saved to /var/cache/conftool/dbconfig/20221208-144152-ladsgroup.json
* 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42647 and previous config saved to /var/cache/conftool/dbconfig/20221208-144131-ladsgroup.json
* 14:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:40 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42646 and previous config saved to /var/cache/conftool/dbconfig/20221208-142625-ladsgroup.json
* 14:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:20 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:20 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:19 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:19 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 14:13 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42645 and previous config saved to /var/cache/conftool/dbconfig/20221208-141118-ladsgroup.json
* 14:10 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 14:09 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 14:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 14:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42644 and previous config saved to /var/cache/conftool/dbconfig/20221208-135611-ladsgroup.json
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42643 and previous config saved to /var/cache/conftool/dbconfig/20221208-135402-ladsgroup.json
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42642 and previous config saved to /var/cache/conftool/dbconfig/20221208-135341-ladsgroup.json
* 13:43 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42641 and previous config saved to /var/cache/conftool/dbconfig/20221208-133835-ladsgroup.json
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42640 and previous config saved to /var/cache/conftool/dbconfig/20221208-132329-ladsgroup.json
* 13:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142] (duration: 00m 15s)
* 13:20 aqu@deploy1002: Started deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142]
* 13:19 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142] (duration: 00m 09s)
* 13:19 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142]
* 13:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42639 and previous config saved to /var/cache/conftool/dbconfig/20221208-130822-ladsgroup.json
* 13:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42638 and previous config saved to /var/cache/conftool/dbconfig/20221208-130612-ladsgroup.json
* 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42637 and previous config saved to /var/cache/conftool/dbconfig/20221208-130551-ladsgroup.json
* 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42635 and previous config saved to /var/cache/conftool/dbconfig/20221208-125045-ladsgroup.json
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
* 12:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42634 and previous config saved to /var/cache/conftool/dbconfig/20221208-124435-ladsgroup.json
* 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42633 and previous config saved to /var/cache/conftool/dbconfig/20221208-123538-ladsgroup.json
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42632 and previous config saved to /var/cache/conftool/dbconfig/20221208-122928-ladsgroup.json
* 12:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 12:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42631 and previous config saved to /var/cache/conftool/dbconfig/20221208-122032-ladsgroup.json
* 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42630 and previous config saved to /var/cache/conftool/dbconfig/20221208-121823-ladsgroup.json
* 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42629 and previous config saved to /var/cache/conftool/dbconfig/20221208-121801-ladsgroup.json
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42628 and previous config saved to /var/cache/conftool/dbconfig/20221208-121422-ladsgroup.json
* 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42627 and previous config saved to /var/cache/conftool/dbconfig/20221208-120255-ladsgroup.json
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42626 and previous config saved to /var/cache/conftool/dbconfig/20221208-115915-ladsgroup.json
* 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42625 and previous config saved to /var/cache/conftool/dbconfig/20221208-115659-ladsgroup.json
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42624 and previous config saved to /var/cache/conftool/dbconfig/20221208-115627-ladsgroup.json
* 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42623 and previous config saved to /var/cache/conftool/dbconfig/20221208-114748-ladsgroup.json
* 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42622 and previous config saved to /var/cache/conftool/dbconfig/20221208-114120-ladsgroup.json
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42621 and previous config saved to /var/cache/conftool/dbconfig/20221208-113240-ladsgroup.json
* 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42620 and previous config saved to /var/cache/conftool/dbconfig/20221208-113030-ladsgroup.json
* 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42619 and previous config saved to /var/cache/conftool/dbconfig/20221208-112951-ladsgroup.json
* 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42618 and previous config saved to /var/cache/conftool/dbconfig/20221208-112612-ladsgroup.json
* 11:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267] (duration: 00m 18s)
* 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267]
* 11:21 moritzm: drain ganeti5002 for eventual decom [[phab:T324610|T324610]]
* 11:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267] (duration: 00m 09s)
* 11:20 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267]
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42617 and previous config saved to /var/cache/conftool/dbconfig/20221208-111444-ladsgroup.json
* 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42616 and previous config saved to /var/cache/conftool/dbconfig/20221208-111105-ladsgroup.json
* 11:10 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 11:09 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42615 and previous config saved to /var/cache/conftool/dbconfig/20221208-110849-ladsgroup.json
* 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42614 and previous config saved to /var/cache/conftool/dbconfig/20221208-110828-ladsgroup.json
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42613 and previous config saved to /var/cache/conftool/dbconfig/20221208-105938-ladsgroup.json
* 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42612 and previous config saved to /var/cache/conftool/dbconfig/20221208-105321-ladsgroup.json
* 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42611 and previous config saved to /var/cache/conftool/dbconfig/20221208-104432-ladsgroup.json
* 10:43 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between [[phab:T323771|T323771]]
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42610 and previous config saved to /var/cache/conftool/dbconfig/20221208-104322-ladsgroup.json
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42609 and previous config saved to /var/cache/conftool/dbconfig/20221208-104300-ladsgroup.json
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42608 and previous config saved to /var/cache/conftool/dbconfig/20221208-103815-ladsgroup.json
* 10:36 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]] (duration: 09m 17s)
* 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
* 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
* 10:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42606 and previous config saved to /var/cache/conftool/dbconfig/20221208-102754-ladsgroup.json
* 10:26 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:865828{{!}}Set externallinks migration to WRITE_BOTH in testwiki (T321662)]]
* 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42605 and previous config saved to /var/cache/conftool/dbconfig/20221208-102308-ladsgroup.json
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42604 and previous config saved to /var/cache/conftool/dbconfig/20221208-102052-ladsgroup.json
* 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42603 and previous config saved to /var/cache/conftool/dbconfig/20221208-102030-ladsgroup.json
* 10:18 hashar: contint1002: activated Icinga monitoring , all services are up and running # [[phab:T313832|T313832]]
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42602 and previous config saved to /var/cache/conftool/dbconfig/20221208-101247-ladsgroup.json
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42600 and previous config saved to /var/cache/conftool/dbconfig/20221208-100524-ladsgroup.json
* 10:01 claime: Deploying puppet enforcement of zuul-merger on contint1002
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42599 and previous config saved to /var/cache/conftool/dbconfig/20221208-095741-ladsgroup.json
* 09:57 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host test-reimage2001.codfw.wmnet
* 09:56 steve_munene: restarting varnishkafka-webrequest.service on host cp1075 [[phab:T323771|T323771]]
* 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42598 and previous config saved to /var/cache/conftool/dbconfig/20221208-095017-ladsgroup.json
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) test-reimage2001.codfw.wmnet on all recursors
* 09:50 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache test-reimage2001.codfw.wmnet on all recursors
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
* 09:49 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
* 09:46 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:46 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host test-reimage2001.codfw.wmnet
* 09:43 hashar: contint1002: stopped puppet and manuallyΒ  started zuul-merger.Β  I am monitoring it cause last time we have bring up a new one it had some issues here and there # [[phab:T313832|T313832]]
* 09:38 hashar: contint1001: manually stopped and masked zuul-merger. It is under maintenance mode in Icinga # [[phab:T313832|T313832]]
* 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42597 and previous config saved to /var/cache/conftool/dbconfig/20221208-093511-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42596 and previous config saved to /var/cache/conftool/dbconfig/20221208-093255-ladsgroup.json
* 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42595 and previous config saved to /var/cache/conftool/dbconfig/20221208-093218-ladsgroup.json
* 09:25 hashar@deploy1002: Finished deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # [[phab:T313832|T313832]] (duration: 00m 07s)
* 09:24 hashar@deploy1002: Started deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # [[phab:T313832|T313832]]
* 09:17 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # [[phab:T313832|T313832]] (duration: 00m 03s)
* 09:17 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # [[phab:T313832|T313832]]
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42594 and previous config saved to /var/cache/conftool/dbconfig/20221208-091712-ladsgroup.json
* 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42593 and previous config saved to /var/cache/conftool/dbconfig/20221208-090205-ladsgroup.json
* 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42592 and previous config saved to /var/cache/conftool/dbconfig/20221208-085724-ladsgroup.json
* 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42591 and previous config saved to /var/cache/conftool/dbconfig/20221208-085657-ladsgroup.json
* 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42590 and previous config saved to /var/cache/conftool/dbconfig/20221208-084659-ladsgroup.json
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42589 and previous config saved to /var/cache/conftool/dbconfig/20221208-084442-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42588 and previous config saved to /var/cache/conftool/dbconfig/20221208-084421-ladsgroup.json
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42587 and previous config saved to /var/cache/conftool/dbconfig/20221208-084151-ladsgroup.json
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42586 and previous config saved to /var/cache/conftool/dbconfig/20221208-082914-ladsgroup.json
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42585 and previous config saved to /var/cache/conftool/dbconfig/20221208-082644-ladsgroup.json
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42584 and previous config saved to /var/cache/conftool/dbconfig/20221208-081408-ladsgroup.json
* 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42583 and previous config saved to /var/cache/conftool/dbconfig/20221208-081138-ladsgroup.json
* 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42582 and previous config saved to /var/cache/conftool/dbconfig/20221208-075901-ladsgroup.json
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42581 and previous config saved to /var/cache/conftool/dbconfig/20221208-075645-ladsgroup.json
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42580 and previous config saved to /var/cache/conftool/dbconfig/20221208-075624-ladsgroup.json
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42579 and previous config saved to /var/cache/conftool/dbconfig/20221208-074117-ladsgroup.json
* 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42578 and previous config saved to /var/cache/conftool/dbconfig/20221208-073122-ladsgroup.json
* 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42577 and previous config saved to /var/cache/conftool/dbconfig/20221208-073101-ladsgroup.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42576 and previous config saved to /var/cache/conftool/dbconfig/20221208-072611-ladsgroup.json
* 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42575 and previous config saved to /var/cache/conftool/dbconfig/20221208-071554-ladsgroup.json
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42574 and previous config saved to /var/cache/conftool/dbconfig/20221208-071104-ladsgroup.json
* 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42573 and previous config saved to /var/cache/conftool/dbconfig/20221208-070847-ladsgroup.json
* 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42572 and previous config saved to /var/cache/conftool/dbconfig/20221208-070825-ladsgroup.json
* 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42571 and previous config saved to /var/cache/conftool/dbconfig/20221208-070048-ladsgroup.json
* 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 31800
* 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 31800
* 06:55 bblack: lvs1017: restarting pybal to take back text traffic (med reverted to normal, underlying problem w/ ipv6 addressed)
* 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42570 and previous config saved to /var/cache/conftool/dbconfig/20221208-065319-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42569 and previous config saved to /var/cache/conftool/dbconfig/20221208-064541-ladsgroup.json
* 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42568 and previous config saved to /var/cache/conftool/dbconfig/20221208-063813-ladsgroup.json
* 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42567 and previous config saved to /var/cache/conftool/dbconfig/20221208-062306-ladsgroup.json
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42566 and previous config saved to /var/cache/conftool/dbconfig/20221208-062050-ladsgroup.json
* 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42565 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42564 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
* 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42563 and previous config saved to /var/cache/conftool/dbconfig/20221208-062006-ladsgroup.json
* 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42562 and previous config saved to /var/cache/conftool/dbconfig/20221208-061436-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42561 and previous config saved to /var/cache/conftool/dbconfig/20221208-060551-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42560 and previous config saved to /var/cache/conftool/dbconfig/20221208-060522-ladsgroup.json
* 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42559 and previous config saved to /var/cache/conftool/dbconfig/20221208-060500-ladsgroup.json
* 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42558 and previous config saved to /var/cache/conftool/dbconfig/20221208-055930-ladsgroup.json
* 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42557 and previous config saved to /var/cache/conftool/dbconfig/20221208-055046-ladsgroup.json
* 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42556 and previous config saved to /var/cache/conftool/dbconfig/20221208-055015-ladsgroup.json
* 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42555 and previous config saved to /var/cache/conftool/dbconfig/20221208-054953-ladsgroup.json
* 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42554 and previous config saved to /var/cache/conftool/dbconfig/20221208-054423-ladsgroup.json
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42553 and previous config saved to /var/cache/conftool/dbconfig/20221208-053541-ladsgroup.json
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42552 and previous config saved to /var/cache/conftool/dbconfig/20221208-053509-ladsgroup.json
* 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42551 and previous config saved to /var/cache/conftool/dbconfig/20221208-053447-ladsgroup.json
* 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42550 and previous config saved to /var/cache/conftool/dbconfig/20221208-053253-ladsgroup.json
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42549 and previous config saved to /var/cache/conftool/dbconfig/20221208-053236-ladsgroup.json
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 05:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 05:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42548 and previous config saved to /var/cache/conftool/dbconfig/20221208-052917-ladsgroup.json
* 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42547 and previous config saved to /var/cache/conftool/dbconfig/20221208-052705-ladsgroup.json
* 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42546 and previous config saved to /var/cache/conftool/dbconfig/20221208-052036-ladsgroup.json
* 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 03:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 02:24 bblack: lvs1017 - restary pybal manually again, back on bgp_med=101 (traffic goes back to lvs1020)
* 02:21 bblack: restarting pybal on lvs1017 manually again with bgp_med=0 (should take traffic, may or may not do so very usefully!)
* 02:05 bblack: sretest1001 - puppet disabled, manipulating routing on this host to conduct tests...
* 01:56 bblack: lvs1017 - manually setting BGP MED to 101 and starting pybal (should come back and and speak BGP, but not steal traffic from lvs1020)
* 01:29 bblack: lvs1017 - disable puppet and stop pybal to fix ipv6 for now
* 01:27 bblack: lvs1017: restart pybal, attempt to fix text-ipv6 service
* 01:05 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic1" lvs at all sites - [[phab:T324336|T324336]]
* 01:00 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic2" lvs at all sites - [[phab:T324336|T324336]]
* 00:47 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "secondary" lvs at all sites - [[phab:T324336|T324336]] (5 hosts, ulsfo completed previously)
* 00:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1012.eqiad.wmnet with OS bullseye
* 00:29 bblack: lvs4010: restart pybal to test etcd key changes - [[phab:T324336|T324336]]
* 00:16 bblack: disabling puppet on all cp and lvs hosts for conftool key changes.Β  Please coordinate if any lvs/pybal/cpNNNN depooling/work is needed during this transition!
* 00:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=cdn
* 00:12 bblack@cumin1001: conftool action : set/weight=1; selector: service=cdn
* 00:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage
* 00:04 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage


== 2020-09-24 ==
== 2022-12-07 ==
* 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
* 23:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1012.eqiad.wmnet with OS bullseye
* 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 21:40 mutante: mw1349 - systemctl reset-failed
* 23:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
* 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42545 and previous config saved to /var/cache/conftool/dbconfig/20221207-233130-ladsgroup.json
* 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
* 23:24 mutante: mx1001 about to run out of disk again -Β  apt-get clean, gzip /var/log/exim4/mainlog.1Β  find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not [[phab:T305567|T305567]]
* 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
* 23:23 mutante: mx1001 - apt-get clean, gzip /var/log/exim4/mainlog.1Β  find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not
Β 
* 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42544 and previous config saved to /var/cache/conftool/dbconfig/20221207-231623-ladsgroup.json
* 23:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:865749{{!}}Make parsoid accept all content models. (T324711)]] (duration: 13m 57s)
* 23:02 samtar@deploy1002: samtar and samtar: Backport for [[gerrit:865749


== 2020-09-23 ==
== 2022-12-06 ==
* 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
* 23:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbd77e3dff0d56b851b3d15b4d267d1faacfae26}}: Add new Racine namespace to frwiktionary ([[phab:T263525|T263525]]) (duration: 01m 05s)
* 23:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
* 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 23:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 22:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:40 mholloway-
* 22:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 22:37 tgr_: UTC late backports done
* 22:36 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 22:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:26 tgr@deploy1002: Finished scap: Backport for [[gerrit:865131


== 2020-09-22 ==
== 2022-12-05 ==
* 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias ([[phab:T255585|T255585]]) (duration: 01m 04s)
* 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
* 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature ([[phab:T261249|T261249]]) (duration: 01m 06s)
* 23:57 tzatziki: removing 2 files for legal compliance
* 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
* 21:48 pt1979@cumin2001: START - Cookbook sre.hosts
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
* 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]:


== 2020-09-21 ==
== 2022-12-04 ==
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 04:19 TheresNoTime: [[phab:T302486|T302486]] : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`
* 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 23:36 mutante: debmonitor2002 - systemctl reset-failed
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
* 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
* 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
* 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 [[phab:T243057|T243057]]
* 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 [[phab:T243057|T243057]]
* 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
* 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
* 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
* 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
* 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:18 mepps: updated crm to {{Gerrit|8f32b6301f}}
* 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:14 ejegg: updated fundraising CiviCRM from {{Gerrit|e5ebf9d18a}} to {{Gerrit|8f32b6301f}}
* 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 [[phab:T249745|T249745]] (duration: 00m 56s)
* 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}} (duration: 06m 54s)
* 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki ([[phab:T254239|T254239]]) and ptwiki ([[phab:T255027|T255027]]) (duration: 00m 56s)
* 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}}
* 18:33 mepps: updated crm from {{Gerrit|cc1f7e6d13}} to {{Gerrit|e5ebf9d18a}}
* 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) ([[phab:T261153|T261153]]) (duration: 00m 57s)
* 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 18:08 XioNoX: add NAT rule to pfw3-codfw - [[phab:T263488|T263488]]
* 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
* 16:46 papaul: shutting down ms-be2019 for BBU replacing
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
* 16:17 papaul: replacingΒ  msw-c8-codfw
* 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
* 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
* 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 56s)
* 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 59s)
* 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
* 15:24 hnowlan: roll-restarting restbase-dev for java security updates
* 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
* 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
* 15:07 moritzm: installing libx11 security updates on stretch
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
* 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
* 14:37 papaul: firmware upgrade on db2127
* 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
* 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
* 14:30 herron: moving prometheus from bast5001 to prometheus5001 [[phab:T243057|T243057]]
* 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end [[phab:T263138|T263138]]
* 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing [[phab:T263443|T263443]]
* 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
* 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end [[phab:T263138|T263138]]
* 14:00 moritzm: installing Java security updates on restbase/sessionstore*
* 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
* 13:21 moritzm: installing glib-networking security updates for Stretch
* 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing [[phab:T263443|T263443]]
* 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
* 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - [[phab:T263230|T263230]]
* 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testingΒ  [[phab:T263443|T263443]]
* 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testingΒ  [[phab:T263443|T263443]]
* 11:38 effie: restart pybal on lvs2009 and lvs1015 - [[phab:T256973|T256973]]
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
* 11:35 Urbanecm: EU B&C done
* 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: {{Gerrit|3fab5882505809b412cff641a17ae5d973db04a4}}: Simplify lead paragraph check (duration: 00m 59s)
* 11:22 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a62212a5a8f4692b860eb3d6c3322c82d88125a9}}: Allow local steward group members to bigdelete (duration: 00m 57s)
* 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # [[phab:T256348|T256348]] # P12683
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1cf4664df87f10bf60b47345dfe3c52d7dc24f6c}}: Set WT namespace alias to NS_PROJECT in shn.wiktionary ([[phab:T256348|T256348]]) (duration: 00m 57s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|01ba82866f3e04c7c635e9089fed4269190b93f0}}: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains ([[phab:T261037|T261037]]) (duration: 00m 57s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd51f47b1f60fbfafdcc623ae22dcadf2c927876}}: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki ([[phab:T262238|T262238]]) (duration: 00m 57s)
* 11:02 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 01m 12s)
* 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
* 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
* 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - [[phab:T262247|T262247]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 [[phab:T262247|T262247]]', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
* 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
* 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
* 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
* 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
* 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - [[phab:T257035|T257035]]
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
* 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
* 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - [[phab:T261717|T261717]]
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
* 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T238966|T238966]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
* 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json


== 2020-09-20 ==
== 2022-12-03 ==
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswikiΒ  --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # [[phab:T263317|T263317]]
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 07:42 gehel: depooling wdqs2002 to catch up on lag
* 07:36 gehel: restarting blazegraph + updater on wdqs2002


== 2020-09-19 ==
== 2022-12-02 ==
* 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 19:11 sukhe: restart pybal on lvs5004
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 12:09 jynus: dropping all databases from db1133
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster


== 2020-09-18 ==
== 2022-12-01 ==
* 21:48 tzatziki: changed password for Millennium bug@ptwiki
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 19:28 eileen: process-control config revision is {{Gerrit|739ea754ca}}
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
Β 
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnetΒ  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnetΒ  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2020-09-17 ==
==Archives ==
* 23:41 ejegg: updated payments-wiki from {{Gerrit|86c997fdb2}} to {{Gerrit|7bb99ce03a}}
* 23:01 ejegg: updated payments-wiki from {{Gerrit|1e5a52ed26}} to {{Gerrit|86c997fdb2}}
* 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|19b9b9877ea3f8ffa6626108941891c2454348de}}: Fix APCOND_FR_NEVERBLOCKED handling (part 3; [[phab:T262970|T262970]]) (duration: 00m 57s)
* 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #[[phab:T262657|T262657]]
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:11 Urbanecm: Morning B&C done
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40591d3dfdc2fc360cb060770677a48e2a53362c}}: Enable DiscussionTools beta on jawiki & viwiki ([[phab:T261654|T261654]]; [[phab:T262109|T262109]]) (duration: 00m 56s)
* 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
* 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:20 rzl: repooled eqiad at 17:11
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 17:03 papaul: restarting ps1-d8-codfw
* 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
* 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
* 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 marostegui: Restart wikibugs
* 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:15 papaul: replacing msw-d8-codfw
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack [[phab:T262901|T262901]]', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
* 16:03 marostegui: Recreate db1131 on tendril [[phab:T262901|T262901]]
* 15:59 marostegui: Update rack location on zarcillo for db1131 [[phab:T262901|T262901]]
* 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
* 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
* 15:43 mepps: updated payments-wiki from {{Gerrit|3c073a6a56}} to {{Gerrit|1e5a52ed26}}
* 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
* 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
* 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
* 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
* 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw [[phab:T195578|T195578]]
* 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
* 14:49 cmjohnson1: ending pdu maintenance in eqiad
* 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
* 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
* 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
* 14:02 marostegui: Start mysql on db1125 after PDU maintenance [[phab:T261459|T261459]]
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
* 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
* 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance [[phab:T260670|T260670]]
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
* 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
* 11:24 matthiasmullie: End Euro B&C
* 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
* 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
* 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - [[phab:T263006|T263006]]
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - [[phab:T263006|T263006]]
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
* 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts [[phab:T261459|T261459]]
* 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
* 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
* 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
* 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - [[phab:T262527|T262527]]
* 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - [[phab:T262527|T262527]]
* 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:37 godog: graphite compress /var/log/carbon logs older than 2d
* 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:25 jayme: reboot kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 08:24 godog: graphite add 300G to /srv
* 07:55 jayme: draining kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 07:55 jayme: cordoning kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
* 06:55 hashar: Taking a heap dump of Gerrit JVM
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
* 05:46 marostegui: Stop mysql on db1131 - [[phab:T262901|T262901]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
* 04:53 marostegui: Deploy schema change on s1 eqiad primary master - [[phab:T238966|T238966]]
* 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
* 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
Β 
== 2020-09-16 ==
* 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module ([[phab:T258008|T258008]]); Revert wider task card on desktop ([[phab:T263042|T263042]], [[phab:T258704|T258704]]); Fix width of sidebar modules in narrow mode in variant A ([[phab:T263068|T263068]]) (duration: 01m 09s)
* 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
* 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki ([[phab:T262207|T262207]]) (duration: 01m 04s)
* 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: [[gerrit:627793{{!}}Check $coords matched some nodes before comparing contents (T263034)]] (duration: 01m 06s)
* 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
* 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
* 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
* 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
* 16:12 ryankemper: `wdqs` deploy complete, service is healthy
* 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
* 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
* 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
* 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:627871{{!}}Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060)]] (production no-op) (duration: 01m 04s)
* 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
* 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
* 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622994{{!}}Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060)]] (duration: 01m 05s)
* 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:622993{{!}}Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060)]] (duration: 01m 02s)
* 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622612{{!}}Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060)]] (duration: 01m 06s)
* 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
* 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
* 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up ([[phab:T262733|T262733]])
* 13:48 marostegui: Start mysql on db1121 after PDU work
* 13:46 James_F: Restarting CI Jenkins for [[phab:T262827|T262827]]
* 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
* 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
* 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
* 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
* 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
* 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
* 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
* 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
* 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: [[phab:T263014|T263014]] Revert "Remove support for (Archived{{!}}OldLocal)File::userCan without a user" (duration: 01m 04s)
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
* 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
* 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
* 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia [[phab:T263006|T263006]]
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
* 10:01 akosiaris: [[phab:T187984|T187984]] Shutdown mendelevium.
* 09:43 jynus: deploying max_packet_size change to m3 instances, too
* 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
* 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
* 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
* 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - [[phab:T262290|T262290]]
* 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - [[phab:T262290|T262290]]
* 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work [[phab:T261454|T261454]] [[phab:T261457|T261457]]
* 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
* 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
* 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for [[phab:T262628|T262628]] (duration: 00m 59s)
* 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
* 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 08:04 akosiaris: [[phab:T187984|T187984]] Validated that ticket.wikimedia.org works, proceeding with a wider announcement
* 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 07:49 akosiaris: [[phab:T187984|T187984]] Switch over ticket.discovery.wmnet to otrs1001
* 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 07:37 akosiaris: [[phab:T187984|T187984]] Tested inbound email successfully
* 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:26 akosiaris: [[phab:T187984|T187984]] Tested outbound email, switching inbound email configuration and performing tests
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
* 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 07:12 akosiaris: [[phab:T187984|T187984]] Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
* 07:03 akosiaris: [[phab:T187984|T187984]] validated that the OTRS installation is functional over SSH
* 07:02 akosiaris: [[phab:T187984|T187984]] migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
* 06:28 godog: codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 06:20 kart_: Updated cxserver to 2020-08-30-011854-production ([[phab:T253439|T253439]], [[phab:T260557|T260557]])
* 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
* 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
* 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
* 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - [[phab:T262290|T262290]]
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
* 05:07 marostegui: Repool labsdb1010
* 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts
Β 
== 2020-09-15 ==
* 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|1c0b0d161fe1024d6d08a27bbacf5b62c56c9c01}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 56s)
* 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|5beace32a396adfcce46b04e7f969b2f9f9effda}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 58s)
* 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ac8bd3894f2dc8f2735cc9fa7b860af1d91c6707}}: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
* 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62b21d55a8f0a94b8cd268d5024df0cf64013dd5}}: Revert "Remove abusefilter-view right grant from wmf-config" ([[phab:T255506|T255506]]) (duration: 00m 59s)
* 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
* 18:32 Urbanecm: Morning B&C done
* 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|084729b7fd0716f11265f1b37570afc120b27109}}: Remove abusefilter-view right grant from wmf-config ([[phab:T255506|T255506]]) (duration: 00m 56s)
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1d3456570b80b1d8af1d2b71975496e54f87b24e}}: Enable MediaWiki client errors on frwiki ([[phab:T255585|T255585]]) (duration: 00m 57s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|79004b7e503c7274fa56d2699b423b6919fbc869}}: Enable the reverted tag on all wikis ([[phab:T164307|T164307]]) (duration: 00m 56s)
* 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: {{Gerrit|If727ae4335}} (duration: 00m 56s)
* 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
* 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
* 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
* 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
* 14:53 godog: switch grafana to eqiad - [[phab:T259143|T259143]]
* 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:38 XioNoX: remove old SNMP community from all network devices
* 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - [[phab:T251609|T251609]] (duration: 00m 56s)
* 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - [[phab:T251609|T251609]] (duration: 00m 06s)
* 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
* 12:18 jbond42: update libxml2 on stretch and jessie
* 12:08 jbond42: rolling restart of php7.2-fpm
* 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
* 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|294931fc6eb9e365894ec0cf94c155d55ecae549}}: Revert "Disable DynamicPageList on ruwikinews" ([[phab:T262240|T262240]]; [[phab:T262391|T262391]]) (duration: 00m 58s)
* 11:17 effie: roll out scap 3.15.0-1 to all - [[phab:T261234|T261234]]
* 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - [[phab:T246890|T246890]]
* 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - [[phab:T246890|T246890]]
* 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
* 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
* 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts [[phab:T261455|T261455]]
* 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:04 gehel: restart elasticsearch on elastic2029 (high GC
* 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
* 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance [[phab:T261456|T261456]]
* 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang enΒ  --quiet' returned non-zero exit status 1 (duration: 11m 10s)
* 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:01 akosiaris: [[phab:T187984|T187984]] migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
* 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
* 07:24 godog: swift codfw add ms-be2057 at object weight 100 - [[phab:T261633|T261633]]
* 07:19 elukey: roll restart druid cluster to pick up openjdk updates
* 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
* 06:57 liw: 1.36.0-wmf.9 was branched at {{Gerrit|7269b6b57b6f79646b96ece818d2f2d38e0d2ea6}} for [[phab:T257977|T257977]]
* 06:08 marostegui: Stop mysql on es2011 to clone es2028
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
* 05:33 marostegui: Depool labsdb1010 for PDU maintenance
* 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw [[phab:T262832|T262832]]
Β 
== 2020-09-14 ==
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:45 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:32 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 21:30 cdanis: [[phab:T257527|T257527]] βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ• πŸΊ sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'enable-puppet "cdanis rolling out Ifa3c68e4"'
* 21:24 cdanis: [[phab:T257527|T257527]] βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ• πŸΊ sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'disable-puppet "cdanis rolling out Ifa3c68e4"'
* 21:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:03 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 20:38 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 20:26 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a588eb0c6}} [[phab:T262087|T262087]] modify wgEventStreams to reference NEL schema (duration: 00m 56s)
* 19:00 Urbanecm: Morning B&C done
* 18:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a5d56edc7460ac43492f9c04cff86c1b03e56fa4}}: {{Gerrit|e2f47980c371b52b1b66957f2bff2266745ab00a}}: Enable Special:Investigate on eswiki ([[phab:T262436|T262436]]) (duration: 00m 56s)
* 18:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 18:38 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|7d1939323cc3ea5dacf67a43d4d359c114203a66}}: Remove investigate from $wgAvailableRights ([[phab:T260175|T260175]]) (duration: 00m 56s)
* 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2fa6533a594c8544342954eae19a4a0f7baeff0}}: Remove the investigate right from testwiki and frwiki ([[phab:T260175|T260175]]) (duration: 00m 56s)
* 18:30 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/EventStreamConfig/includes/: {{Gerrit|a4c86089371319ae5a3bb6053c4a9b3e83130286}}: Default to using API json formatversion=2 ([[phab:T251609|T251609]]) (duration: 00m 57s)
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27ba5a1da1fb00e721cfa82dd4cd1fbac2541053}}: add new parse* servers to $wgLinterSubmitterWhitelist ([[phab:T247441|T247441]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|720e6cbfe1800fe32dc65c209240ba08706dbb17}}: flaggedrevs: Move setting of wgFlaggedRevsAutopromote and wgFlaggedRevsAutoconfirm out of wgExtensionFunctions ([[phab:T237191|T237191]]) (duration: 00m 56s)
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|699f5e8c2a50f35e98850ea32f7847d183600351}}: Add logo Wordmark and Tagline for hywiki ([[phab:T259985|T259985]]) (duration: 00m 55s)
* 18:08 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|699f5e8c2a50f35e98850ea32f7847d183600351}}: Add logo Wordmark and Tagline for hywiki ([[phab:T259985|T259985]]) (duration: 00m 56s)
* 17:51 mutante: all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist ([[phab:T247441|T247441]])
* 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
* 17:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
* 17:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2002.codfw.wmnet
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
* 16:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
* 16:36 mutante: pooled the first of the new parsoid servers - parse2001 ([[phab:T247441|T247441]])
* 16:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
* 16:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
* 16:04 elukey: completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster
* 16:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
* 16:01 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[0-2][0-9].codfw.wmnet
* 15:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
* 15:58 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
* 15:54 moritzm: restarting apache on webperf* to pick up GNU TLS security update
* 15:45 moritzm: restarting apache/FPM on mw2271/m2272 (codfw canaries) to pick up GNU TLS update
* 15:35 moritzm: installing gnutls28 security updates on stretch
* 15:23 elukey: enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005
* 15:17 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:626229{{!}}Allow public access to API Portal main page for private launch]] (duration: 00m 57s)
* 15:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:11 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:11 cmjohnson1: completed pdu swap in eqiad racks d5/d6
* 14:55 elukey: ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now
* 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:16 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:11 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:42 moritzm: installing dbus security updates on stretch
* 13:42 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:32 moritzm: installing websockify stretch updates
* 13:10 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:51 cmjohnson1: correction it's replacing the pdu's in racks d5 and d6
* 12:50 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1438 --new-data-type external-id ([[phab:T262198|T262198]])
* 12:49 cmjohnson1: replacing pdu's in racks d4 and d5 eqiad
* 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-snmp (exit_code=1)
* 12:30 ayounsi@cumin1001: START - Cookbook sre.pdus.rotate-snmp
* 12:30 XioNoX: rotate SNMP community on all the PDUs - [[phab:T246890|T246890]]
* 12:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:24 moritzm: rebooting sodium for kernel update
* 12:09 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:08 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 12:06 akosiaris: [[phab:T187984|T187984]] migration script on otrs1001 now in step 31/44
* 12:03 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fea8861db550746bfef496df2ef522dffc580a7d}}: Follow-up {{Gerrit|0ee0d8f}}: [frwiktionary] Create `conj` alias ([[phab:T262298|T262298]]) (duration: 00m 56s)
* 11:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:48 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:48 volans@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:46 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:45 volans@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:41 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:41 volans@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:40 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:39 volans@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:36 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:35 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 11:27 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for MCR', diff saved to https://phabricator.wikimedia.org/P12578 and previous config saved to /var/cache/conftool/dbconfig/20200914-112648-marostegui.json
* 11:24 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 11:20 marostegui: Remove triggers from db1124:3311 - [[phab:T238966|T238966]]
* 11:19 marostegui: Deploy MCR schema change on s1, this will generate lag on s1 labsdb - [[phab:T238966|T238966]]
* 11:13 Urbanecm: EU B&C window done
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|47fe87c5756f9e4d1aad059925a5b289322460c5}}:Β  [itwiki] Increase $wgAutoConfirmAge and $wgAutoConfirmCount ([[phab:T262738|T262738]]) (duration: 00m 56s)
* 11:09 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts [[phab:T261455|T261455]]
* 11:05 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # [[phab:T262298|T262298]] # P12576
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ee0d8f7422afe9c4ce215613c1dd212da85a466}}: [frwiktionary] Create new namespace "Conjugaison" & associated talk ([[phab:T262298|T262298]]) (duration: 00m 56s)
* 11:00 volans: Mass importing IPs from PuppetDB into Netbox [[phab:T244153|T244153]]
* 10:59 XioNoX: create LACP bundle to labtestvirt2003
* 10:50 jbond42: enable git protocol version2 fleet wide
* 10:43 effie: deploy scap 3.15.0-1 to canaries - [[phab:T261234|T261234]]
* 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 09:27 akosiaris: [[phab:T187984|T187984]] migration script on otrs1001 now in step 8/44 (correction)
* 09:26 akosiaris: [[phab:T187984|T187984]] migration script on otrs1001 now in step 8/41
* 09:09 akosiaris: db1077. stop slave ; show slave status > /home/akosiaris/show_slave_status; reset slave all [[phab:T187984|T187984]]
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2026 on es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12575 and previous config saved to /var/cache/conftool/dbconfig/20200914-085842-marostegui.json
* 08:49 akosiaris: start the OTRS upgrade to 6.0.29 [[phab:T187984|T187984]]
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12574 and previous config saved to /var/cache/conftool/dbconfig/20200914-084509-marostegui.json
* 08:42 moritzm: upgrading remaining stretch systems to git 2.20 [[phab:T262244|T262244]]
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12573 and previous config saved to /var/cache/conftool/dbconfig/20200914-083525-marostegui.json
* 08:17 _joe_: restarting pybal on lvs2009
* 08:16 _joe_: repooling mw2297
* 08:14 _joe_: restarting php on mw2297, php-fpm stuck in SIGILL
* 08:14 marostegui: Stop MySQL on db2125 for on-site maintenance - [[phab:T260670|T260670]]
* 08:12 _joe_: restarting pybal on lvs2010
* 08:09 _joe_: restarting pybal on lvs1015
* 08:05 godog: prometheus codfw ops, extend the lv by 100G
* 08:04 marostegui: Stop MySQL on es2017 to clone es2027
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 to clone es2027 - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12572 and previous config saved to /var/cache/conftool/dbconfig/20200914-080344-marostegui.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2018 as es3 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12571 and previous config saved to /var/cache/conftool/dbconfig/20200914-080239-marostegui.json
* 07:58 _joe_: restarting pybal on lvs1015
* 07:52 _joe_: restarting pybal on lvs1016
* 07:40 jayme: shutting down etcd100[1-3] (sheduled for decommission, replaced by kubetcd100[4-6])
* 07:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:39 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12570 and previous config saved to /var/cache/conftool/dbconfig/20200914-073919-marostegui.json
* 06:56 elukey: slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168)
* 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 05:54 elukey: execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - [[phab:T258768|T258768]]
* 05:54 marostegui: Truncate tendril.general_log_sampled on db1115 - [[phab:T262782|T262782]]
* 05:47 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 for the first time with minimum weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12569 and previous config saved to /var/cache/conftool/dbconfig/20200914-053844-marostegui.json
Β 
== 2020-09-13 ==
* 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation ([[phab:T262733|T262733]])
* 05:51 effie: sudo -i depool mw2297
Β 
== 2020-09-12 ==
* 01:07 mutante: people2001 - rsyncing user home dirs from people1002
* 00:38 mutante: all issues with hosts doing stuff "on every run" have been fixed except one is left: analytics1034
Β 
== 2020-09-11 ==
* 22:54 mutante: starting people2001 VM
* 17:30 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:22 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:12 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:27 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:55 jynus: starting snapshot of m2 from db1117
* 08:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 07:59 XioNoX: remove BGP to AS64271 in AMS-IX (see peering@ email)
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:17 moritzm: rebootin ldap-corp server for kernel update
* 07:02 moritzm: remove git-core from stretch systems, it's a transition package no longer provided by the 2.20 backport from Buster
* 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 02:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:54 mutante: downtimes 48h for parse* hosts not in production yet but getting icinga checks from applied role
* 01:53 mutante: ACKed alerts for eqiad power switches after making [[phab:T262629|T262629]]
* 01:53 mutante: initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP ([[phab:T247441|T247441]])
* 01:45 mutante: mw2296 - restarted php7.2-fpm
* 01:42 mutante: mw2296 - systemctl restart apache2 - rescheduled icinga alerts for apache and php-fpm
* 01:33 mutante: initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP ([[phab:T247441|T247441]])
* 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix (duration: 00m 07s)
* 01:32 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix
* 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20]: Simple hql syntax fix (duration: 08m 09s)
* 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:24 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20]: Simple hql syntax fix
* 00:41 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca] (duration: 00m 08s)
* 00:41 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca]
* 00:40 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca] (duration: 08m 25s)
* 00:38 mutante: generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 ([[phab:T247441|T247441]])
* 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:31 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca]
* 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:01 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
Β 
== 2020-09-10 ==
* 23:44 ejegg: updated payments-wiki from {{Gerrit|e41ab173e0}} to {{Gerrit|3c073a6a56}}
* 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:43 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:31 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:11 ejegg: updated payments-wiki from {{Gerrit|be81063168}} to {{Gerrit|e41ab173e0}}
* 22:06 mutante: added mcrouter cert for parse2020, ran mcrouter_generate_certs
* 21:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.8
* 20:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:20 longma: correction: [[phab:T257976|T257976]] - 1.36.0-wmf.8 to all wikis
* 20:20 longma: deploying 1.36.0-wmf.8 to all wikis
* 20:02 krinkle@deploy1001: Synchronized php-1.36.0-wmf.8/includes/resourceloader/ResourceLoaderSkinModule.php: {{Gerrit|Ibe2c9f8d024f6}} (duration: 01m 05s)
* 19:44 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # [[phab:T262163|T262163]]
* 19:12 mholloway-shell@deploy1001: Started restart [recommendation-api/deploy@db7fd80]: (no justification provided)
* 19:07 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # [[phab:T262163|T262163]]
* 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|95d2b575c683a1c5c2972a9bf0cf3b87059fbd74}}: Set $wgCategoryCollation = uca-tr on trwiktionary ([[phab:T262163|T262163]]) (duration: 01m 05s)
* 18:58 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # [[phab:T262398|T262398]]
* 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|09e487e76158026ba161acffad277928d2603891}}: Add a new namespace to frwiktionary ([[phab:T262398|T262398]]) (duration: 01m 04s)
* 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/includes/EditPage.php: {{Gerrit|824094428c5f41dc9eef7d65c8440dadda4d4dbd}}: EditPage: Fix member call on boolean when undo is impossible ([[phab:T262463|T262463]]) (duration: 01m 03s)
* 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/includes/EditPage.php: {{Gerrit|824094428c5f41dc9eef7d65c8440dadda4d4dbd}}: EditPage: Fix member call on boolean when undo is impossible ([[phab:T262463|T262463]]) (duration: 01m 07s)
* 18:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: {{Gerrit|0cde0b15fc1daca2cef904bc7add7e9a1c58e3c9}}: Add throttle rule for Czech senior citizens course ([[phab:T262415|T262415]]) (duration: 01m 05s)
* 18:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:00 mutante: helium (former backup host) is being removed from ferm rules on all hosts, it was replaced by backup1001 ([[phab:T260717|T260717]])
* 17:33 bblack: dns servers: upgrading remainder of fleet to gdnsd-3.3.0-1~wmf1
* 16:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:25 bblack: authdns1001 - upgrade gdnsd to 3.3.0-1~wmf1
* 16:06 bblack: dns4001 - upgrade gdnsd to 3.3.0-1~wmf1
* 16:04 bblack: reprepro: uploaded gdnsd-3.3.0-1~wmf1 - [[phab:T261340|T261340]]
* 15:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:04 volans: uploaded cumin_4.0.0 to apt.wikimedia.org buster-wikimedia (no code changes)
* 13:58 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:52 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:42 moritzm: rebooting etherpad1002 (etherpad.wikimedia.org) for kernel update
* 13:24 moritzm: installing rake security updates on stretch
* 13:10 ebernhardson: delete lldwiki_<nowiki>{</nowiki>content{{!}}general<nowiki>}</nowiki> indices from search.svc.<nowiki>{</nowiki>eqiad{{!}}codfw<nowiki>}</nowiki>.wmnet:9643 (psi), they should be on 9443 (omega)
* 12:57 klausman: Ran puppet-merge to get my dotfiles from https://gerrit.wikimedia.org/r/c/operations/puppet/+/626367 out
* 12:34 moritzm: installing firejail updates on maps/thumbor/restbase
* 12:01 moritzm: upgrading deployment servers to git 2.20 [[phab:T262244|T262244]]
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P12557 and previous config saved to /var/cache/conftool/dbconfig/20200910-113758-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P12556 and previous config saved to /var/cache/conftool/dbconfig/20200910-113426-marostegui.json
* 11:13 matthiasmullie: Euro B&C done
* 11:13 moritzm: uploaded git 2.20.1-2+deb10u3~wmf1 to stretch-wikimedia/main [[phab:T262244|T262244]]
* 11:11 mlitn@deploy1001: Synchronized php-1.36.0-wmf.8//extensions/WikimediaEvents/: WikimediaEvents: Enable MediaSearch A/B test (duration: 01m 06s)
* 10:42 duesen_: daniel@mwmaint2001:~$Β  mwscript maintenance/findBadBlobs.php jvwiki --revisions 214173 --mark [[phab:T262457|T262457]]
* 10:34 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:32 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 XioNoX: move VRRP master to cr2-esams
* 10:21 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:45 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12555 and previous config saved to /var/cache/conftool/dbconfig/20200910-093106-marostegui.json
* 09:26 dcausse: creating missing cirrus indices for jawikivoyage [[phab:T262518|T262518]]
* 09:24 dcausse: creating missing cirrus indices for jawikivoyage [[phab:T260228|T260228]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12554 and previous config saved to /var/cache/conftool/dbconfig/20200910-091335-marostegui.json
* 08:49 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:47 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12551 and previous config saved to /var/cache/conftool/dbconfig/20200910-082304-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12550 and previous config saved to /var/cache/conftool/dbconfig/20200910-073107-marostegui.json
* 07:03 elukey: resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - [[phab:T262385|T262385]]
* 05:29 marostegui: Deploy schema change on s3 master - [[phab:T260476|T260476]]
* 00:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master (duration: 06m 42s)
* 00:24 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master
* 00:23 twentyafterfour: done. Phabricator update complete
* 00:23 twentyafterfour: applying database migrations to phabricator db
* 00:09 twentyafterfour: deploying phabricator update 2020-09-10 https://phabricator.wikimedia.org/project/view/4755/
Β 
== 2020-09-09 ==
* 23:51 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915 (duration: 00m 05s)
* 23:51 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915
* 23:37 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/CirrusSearch/includes/Search/InterleavedResultSet.php: Repair passing interleaved search metrics from backend to frontend (duration: 01m 04s)
* 20:13 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:625914 (duration: 01m 03s)
* 20:03 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:626190 [[phab:T261425|T261425]] (duration: 01m 03s)
* 20:01 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.8/skins/WikimediaApiPortal: Backport gerrit:626044, [[phab:T261425|T261425]] (duration: 01m 12s)
* 19:11 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.8 (duration: 01m 03s)
* 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.8
* 18:19 _joe_: banning urls ^/api/rest_v1/page/mobile-html-offline-resources/ from varnish caches
* 18:19 Urbanecm: Morning B&C window done
* 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b226330c1b3bd3dae113e375e2afb4d6af774cde}}: Enable $wgAllowCrossOrigin on all wikis ([[phab:T262425|T262425]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|85e36ae12e7467a559e3d52c58cc3a71ffd09ded}}: Enable MediaWiki client errors on commonswiki and metawiki ([[phab:T255585|T255585]]) (duration: 01m 06s)
* 18:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], feed timeout (duration: 02m 55s)
* 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], feed timeout
* 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], feed timeout (duration: 06m 47s)
* 17:52 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], feed timeout
* 17:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], take 2 (duration: 09m 38s)
* 17:42 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 [[phab:T262437|T262437]], take 2
* 17:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 [[phab:T262437|T262437]] (duration: 06m 00s)
* 17:35 ppchelko@deploy1001: Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 [[phab:T262437|T262437]]
* 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:28 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:25 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:24 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:22 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:15 marostegui: Stop mysql on db2125 for on-site maintenance [[phab:T260670|T260670]]
* 16:10 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]]) [take 3] (duration: 00m 11s)
* 16:10 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]]) [take 3]
* 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:06 bd808: scap3 of Striker to labweb1001 failing. Will investigate.
* 16:05 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]]) [take 2] (duration: 00m 11s)
* 16:05 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]]) [take 2]
* 16:04 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]]) (duration: 01m 21s)
* 16:03 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag ([[phab:T262323|T262323]], [[phab:T144111|T144111]])
* 15:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:11 herron: prometheus1003: systemctl restart thanos-sidecar@ops.service
* 14:29 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:22 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:00 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:57 marostegui: Restart mysql on db1115 [[phab:T231769|T231769]]
* 13:54 bblack: deployed https://gerrit.wikimedia.org/r/626153
* 12:47 _joe_: restarting php-fpm on wtp2003
* 12:46 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 12:37 cmjohnson1: beginning scheduled PDU maintenance racks D5 and D6 in eqiad
* 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12545 and previous config saved to /var/cache/conftool/dbconfig/20200909-123634-kormat.json
* 12:31 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12544 and previous config saved to /var/cache/conftool/dbconfig/20200909-123109-kormat.json
* 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:11 moritzm: installing zeromq security updates on Buster
* 12:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:37 awight: EU Bacon complete
* 11:34 awight@deploy1001: Synchronized wmf-config: Config: [[gerrit:624750{{!}}api-portal: required extended configuration (T261425)]] (duration: 01m 08s)
* 11:15 moritzm: added Tobias Klausmann to pwstore
* 11:14 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:03 marostegui: Stop MySQL on s2 eqiad master to prepare for the PDU maintenance (this will generate lag on s2 on labsdb) [[phab:T261453|T261453]]
* 10:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:28 volans: restarting ferm on failed hosts: an-test-master1001.eqiad.wmnet,an-worker1116.eqiad.wmnet,db[1075,1101,1116].eqiad.wmnet,labstore1007.wikimedia.org,logstash[1025,1030].eqiad.wmnet leftover from yesterday network issue
* 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:11 klausman: Rebooting stat1005 for clearing GPU status and testing new DKMS driver ([[phab:T260442|T260442]])
* 10:09 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12542 and previous config saved to /var/cache/conftool/dbconfig/20200909-100157-kormat.json
* 09:52 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12541 and previous config saved to /var/cache/conftool/dbconfig/20200909-095219-kormat.json
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12540 and previous config saved to /var/cache/conftool/dbconfig/20200909-093353-kormat.json
* 09:26 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12539 and previous config saved to /var/cache/conftool/dbconfig/20200909-092621-kormat.json
* 09:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:11 moritzm: installing qemu security updates on Buster
* 09:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 08:53 _joe_: restarting restbase on rb2009 (depooled)
* 08:53 godog: upgrade kibana to 7.9.1 on the logstash7 cluster
* 08:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12538 and previous config saved to /var/cache/conftool/dbconfig/20200909-085147-kormat.json
* 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12537 and previous config saved to /var/cache/conftool/dbconfig/20200909-084433-kormat.json
* 08:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 08:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json
* 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:30 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json
* 08:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 07:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews ([[phab:T262240|T262240]]) (duration: 01m 22s)
* 07:25 elukey: restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage
* 06:21 XioNoX: push new pfw policies - [[phab:T262297|T262297]]
* 01:58 eileen: civicrm revision changed from {{Gerrit|4e40a59d42}} to {{Gerrit|cc1f7e6d13}}, config revision is {{Gerrit|4845a229dc}}
Β 
== 2020-09-08 ==
* 23:47 eileen: civicrm revision is {{Gerrit|4e40a59d42}}, config revision is {{Gerrit|d26334fa36}}
* 23:25 eileen: civicrm revision changed from {{Gerrit|5e7352e2c3}} to {{Gerrit|4e40a59d42}}, config revision is {{Gerrit|3cf0913789}}
* 22:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:12 andrew@deploy1001: Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s)
* 22:08 andrew@deploy1001: Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update
* 22:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:57 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s)
* 21:57 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks
* 19:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8
* 19:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s)
* 18:22 elukey: rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - [[phab:T260305|T260305]]
* 18:00 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.8
* 17:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 17:54 Amir1: Deployed patch for [[phab:T262240|T262240]]
* 17:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:23 andrewbogott: rebooting cloudvirt1033
* 17:03 klausman: attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs
* 16:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - [[phab:T251609|T251609]] (duration: 00m 58s)
* 16:34 herron: increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag)
* 16:12 longma: 1.36.0-wmf.8 was branched at {{Gerrit|e81e81e91473cc8259c473165863aca8ecea2784}} for [[phab:T257976|T257976]]
* 16:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 16:03 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 16:02 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:34 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1004.*
* 15:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.*
* 15:30 elukey: roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 15:20 _joe_: restarted celery-ores-worker.service on ores1007
* 15:19 _joe_: restarted ferm on wdqs1011
* 15:18 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 15:16 _joe_: starting wdqs-updater on wdqs1005
* 15:15 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
* 15:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet
* 15:14 bblack: repool cp1087-90 (eqiad row D)
* 15:13 herron: rolling restart of elk5 logstashes
* 15:10 marostegui: Start mysql on db1106 after PDU maintenance is done
* 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.*
* 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes1004.*
* 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 4 port 0
* 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 0 member 2 port 50
* 15:02 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 1 port 1
* 14:53 marostegui: Reload dbproxy1016 to recover the alert
* 14:45 jynus: restarting bacula-dir @ backup1001
* 14:44 XioNoX: reboot asw2-d3-eqiad
* 14:33 moritzm: bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts
* 14:31 volans: restarted ssh on mc1033 from console
* 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 1 member 4 port 0
* 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 0 member 2 port 50
* 14:13 akosiaris: drain kubernetes1013, kubernetes1004. They are on row D
* 14:13 bblack: dns1002 - disable puppet + bird service (stop advertising recdns from row D)
* 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet
* 13:59 bblack: depooling cp1087-1090
* 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet
* 13:57 XioNoX: asw2-d-eqiad> request system reboot member 3
* 13:35 cmjohnson1: the power cable was not properly seated and lost power to asw2-d3-eqiad
* 13:34 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 13:30 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:25 mateusbs17: Restarted puppetdb on deployment-puppetdb03 ([[phab:T248041|T248041]])
* 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:20 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:18 cmjohnson1: swapping pdu's in eqiad, mgmt for racks d3 and d4 will go down
* 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 13:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 13:14 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12523 and previous config saved to /var/cache/conftool/dbconfig/20200908-123546-kormat.json
* 12:34 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12522 and previous config saved to /var/cache/conftool/dbconfig/20200908-122702-kormat.json
* 12:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12521 and previous config saved to /var/cache/conftool/dbconfig/20200908-121139-kormat.json
* 12:04 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12520 and previous config saved to /var/cache/conftool/dbconfig/20200908-120419-kormat.json
* 12:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 11:18 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:15 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:53 marostegui: Deploy schema change on s3 eqiad master - [[phab:T253276|T253276]]
* 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 10:20 marostegui: Deploy schema change on s4 eqiad master - [[phab:T253276|T253276]]
* 10:14 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 10:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:11 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12519 and previous config saved to /var/cache/conftool/dbconfig/20200908-100852-kormat.json
* 09:52 akosiaris: enable puppet, run it on all k8s eqiad nodes and double check that calico-node is fine [[phab:T239835|T239835]]
* 09:43 akosiaris: stopped calico-node and kube-apiserver on k8s nodes/masters [[phab:T239835|T239835]]
* 09:43 marostegui: Stop mysql on es2014 to clone es2026 [[phab:T261717|T261717]]
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12517 and previous config saved to /var/cache/conftool/dbconfig/20200908-093957-marostegui.json
* 09:37 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs (#2), [[phab:T261489|T261489]]"
* 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:28 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12515 and previous config saved to /var/cache/conftool/dbconfig/20200908-092755-kormat.json
* 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:20 jayme: disabling puppted on argon.eqiad.wmnet,chlorine.eqiad.wmnet,kubernetes[1001-1016].eqiad.wmnet - Reinitialize eqiad k8s cluster with new etcd - [[phab:T239835|T239835]]
* 08:55 marostegui: Deploy schema change on s7 eqiad master - [[phab:T253276|T253276]]
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2127's weight', diff saved to https://phabricator.wikimedia.org/P12514 and previous config saved to /var/cache/conftool/dbconfig/20200908-084834-marostegui.json
* 08:45 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs, [[phab:T261489|T261489]]"
* 08:23 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
* 08:22 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 08:21 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
* 08:20 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
* 08:16 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
* 07:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update [[phab:T250887|T250887]] mitigations" ([[phab:T250887|T250887]]; [[phab:T262242|T262242]]) (duration: 00m 59s)
* 07:44 elukey: roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades
* 07:40 XioNoX: move HE from ix to transit BGP group on cr3-eqsin
* 07:00 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 06:58 marostegui: Deploy schema change on s2 eqiad master - [[phab:T253276|T253276]]
* 06:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P12513 and previous config saved to /var/cache/conftool/dbconfig/20200908-065022-marostegui.json
* 06:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 06:31 marostegui: Deploy schema change on s5 eqiad master - [[phab:T253276|T253276]]
* 06:23 elukey: roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings
* 06:14 marostegui: Stop MySQL on db1106 for PDU maintenance [[phab:T261452|T261452]]
* 05:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
Β 
== 2020-09-07 ==
* 23:35 Reedy: Deployed patch for [[phab:T262213|T262213]]
* 21:19 reedy@deploy1001: Synchronized private/PrivateSettings.php: Remove old mitigation (duration: 00m 55s)
* 18:04 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 00m 56s)
* 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12511 and previous config saved to /var/cache/conftool/dbconfig/20200907-153857-kormat.json
* 15:32 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12510 and previous config saved to /var/cache/conftool/dbconfig/20200907-153206-kormat.json
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12509 and previous config saved to /var/cache/conftool/dbconfig/20200907-152117-kormat.json
* 15:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12508 and previous config saved to /var/cache/conftool/dbconfig/20200907-151718-kormat.json
* 15:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12507 and previous config saved to /var/cache/conftool/dbconfig/20200907-150901-kormat.json
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 moritzm: rebooting poolcounter1004/1005
* 15:03 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P12506 and previous config saved to /var/cache/conftool/dbconfig/20200907-150310-kormat.json
* 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1133 from dbctl [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json
* 14:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:48 _joe_: restarting pybal in codfw to pick up the new mobileapps TLS endpoint
* 13:44 _joe_: restarting pybal in eqiad to pick up the new mobileapps TLS endpoint
* 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:28 hashar@deploy1001: Finished deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # [[phab:T149924|T149924]] (duration: 00m 05s)
* 13:27 hashar@deploy1001: Started deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # [[phab:T149924|T149924]]
* 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 hashar@deploy1001: Finished deploy [integration/docroot@11ab4a0]: (no justification provided) (duration: 00m 10s)
* 13:22 hashar@deploy1001: Started deploy [integration/docroot@11ab4a0]: (no justification provided)
* 13:14 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:04 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 12:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 12:43 kormat@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
* 12:42 kormat@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:29 marostegui: Upgrade and reboot db2094 and db2095 (sanitarium hosts in codfw)
* 12:18 gehel: restarting elasticsearch on elastic2029 (high GC)
* 12:01 volans: restart uwsgi on debmonitor1002 to test db reconnection
* 11:58 marostegui: Reboot pc1008 for upgrade
* 11:36 Urbanecm: EU B&C done
* 11:30 urbanecm@deploy1001: Synchronized docroot/noc/index.html: {{Gerrit|bbfe2ce61014f616d89bc0c21a380c15777b62e3}}: noc: Remove link to outdated blog ([[phab:T259978|T259978]]) (duration: 00m 57s)
* 11:27 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|ff9f1042529bd332effc0fcd18db70f417c2e939}}: Update help URL ([[phab:T256623|T256623]]) (duration: 00m 56s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b512d3a27c4c33949389cbbe7823cc534fbff9a}}: [hewiktionary] Enable wikilove ([[phab:T262181|T262181]]) (duration: 00m 57s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|35224f43f1c461d42da5c963bb60d28fbe1992ee}}: [eswiki] Create an `abusefilter` user group ([[phab:T262174|T262174]]; 2/2) (duration: 00m 57s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|35224f43f1c461d42da5c963bb60d28fbe1992ee}}: [eswiki] Create an `abusefilter` user group ([[phab:T262174|T262174]]; 1/2) (duration: 01m 20s)
* 11:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewiktionary wikilove # [[phab:T262181|T262181]]
* 11:01 marostegui: Reboot pc1007 for upgrade
* 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:36 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 09:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 09:12 dcausse@deploy1001: Finished deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server) (duration: 00m 33s)
* 09:11 dcausse@deploy1001: Started deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server)
* 09:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:02 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:49 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 08:29 jayme@deploy2001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:19 marostegui: Upgrade and restart pc1010
* 08:18 jayme@deploy2001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:10 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:03 marostegui: Compress InnoDB on s8 eqiad master (db1109) - [[phab:T232446|T232446]]
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json
* 04:56 marostegui: Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki [[phab:T254462|T254462]]
* 04:53 marostegui: Deploy schema change on db1109 (eqiad wikidata master) - [[phab:T256685|T256685]]
Β 
== 2020-09-06 ==
* 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json
* 08:20 elukey: powercycle mw1360 (mgmt console available, network errors while running anything)
* 08:04 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet
* 08:01 elukey: executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)
Β 
== 2020-09-05 ==
* 00:23 foks: removing 2 files for legal compliance
Β 
== 2020-09-04 ==
* 22:15 ryankemper: wdqs deploy complete, service is healthy
* 21:54 ryankemper: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 21:52 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:49 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)
* 21:37 ryankemper: Tests on canary `wdqs1003` passing, beginning full wdqs deploy
* 21:36 ryankemper@deploy1001: Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47
* 21:31 ryankemper: `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`
* 21:06 mutante: apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light ([[phab:T261962|T261962]])
* 21:02 mutante: apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo
* 20:59 mutante: apt2001 - sudo apt-get autoremove
* 20:51 mutante: apt2001 - apt-get remove --purge libnginx*Β  and run puppet to replace nginx-full with nginx-light ([[phab:T261962|T261962]])
* 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 mutante: Icinga - ACKing with sticky - alerts on test and dev hosts
* 18:10 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)
* 18:02 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing
* 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json
* 10:28 marostegui: Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hostsΒ  [[phab:T238966|T238966]]
* 09:48 marostegui: Restart prometheus-mysqld-exporter on db2125
* 09:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 08:31 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 08:29 elukey: roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades
* 08:08 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
* 07:30 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
* 05:13 marostegui: Deploy MCR schema change on s4 eqiad master [[phab:T238966|T238966]]
* 01:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s)
* 01:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:30 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 01:23 ryankemper: (Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699)
* 01:16 ryankemper: Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph`
* 00:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
Β 
== 2020-09-03 ==
* 23:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|93947391e97be11a9cd7eb4713b274b05d5b371a}}: Start logging log-ins on select wikis ([[phab:T253802|T253802]]) (duration: 00m 56s)
* 21:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:55 milimetric@deploy1001: deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s)
* 19:54 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
* 19:07 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s)
* 19:07 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149]
* 19:06 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s)
* 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149]
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:28 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:02 papaul: power down ores2009 for DIMM upgrade
* 16:45 papaul: power down ores2008 for DIMM upgrade
* 16:33 papaul: power down ores2007 for DIMM upgrade
* 16:24 elukey: roll restart aqs on aqs1* to pick up new druid settings
* 16:05 papaul: power down ores2006 for DIMM upgrade
* 15:51 papaul: power down ores2005 for DIMM upgrade
* 15:33 papaul: power down ores2004 for DIMM upgrade
* 15:30 moritzm: installing nginx updates on apt* and htmldumper1001
* 15:25 moritzm: installingΒ  firejail update (along with restarts) on thumbor1001, maps1001, restbase1016 (and -dev)
* 15:22 papaul: power down ores2003 for DIMM upgrade
* 15:17 moritzm: installing firejail security updates on parsoid servers
* 15:08 papaul: power down ores2002 for DIMM upgrade
* 14:53 papaul: power down ores2001 for DIMM upgrade
* 14:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:30 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:29 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 06s)
* 14:29 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
* 14:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 marostegui: Failover m5 (wikitech) master - [[phab:T260324|T260324]]
* 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:43 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 18s)
* 13:43 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
* 13:40 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me (duration: 01m 29s)
* 13:39 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me
* 13:32 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host (duration: 00m 05s)
* 13:32 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host
* 13:08 marostegui: Start pre m5 failover steps [[phab:T260324|T260324]]
* 12:46 marostegui: Deploy MCR schema change on s7 eqiad master (lag might show up) - [[phab:T238966|T238966]]
* 12:30 hnowlan: enabling puppet on appservers, finished rollout of api.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'Shift weights in s2 codfw to account for db2125 being down [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12485 and previous config saved to /var/cache/conftool/dbconfig/20200903-121916-kormat.json
* 12:17 moritzm: installing openexr security updates for stretch
* 12:03 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2125 after hw issue', diff saved to https://phabricator.wikimedia.org/P12483 and previous config saved to /var/cache/conftool/dbconfig/20200903-120304-kormat.json
* 11:45 moritzm: installing net-snmp security updates on Stretch
* 11:45 moritzm: installing net-snmp security updates on Buster
* 11:33 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix {{!}} phaste # [[phab:T260320|T260320]] # P12481
* 11:28 moritzm: installing PHP 7.0 security updates
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04281a0875d34e1161f44697f732d898ab12d4f0}}: Add extra namespaces for jawikivoyage ([[phab:T260320|T260320]]) (duration: 01m 01s)
* 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: {{Gerrit|976d7350a7252610e4ba34e9227e205d085a609a}}: Lift IP cap on 2020-09-08 for Senior Citizen Write Wikipedia course - cs.wikipedia ([[phab:T261882|T261882]]) (duration: 01m 01s)
* 11:21 gilles@deploy1001: Synchronized static/images/project-logos: [[phab:T252108|T252108]] Deploying lossily optimised Wikipedia logos (duration: 01m 20s)
* 10:50 hnowlan: disabling apache on appservers for rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
* 10:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:07 XioNoX: re-apply vlan 1118 firewall filter and update OSPF/bootp on cr1/2-eqiad - [[phab:T261866|T261866]]
* 09:57 XioNoX: rectification: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 on cr1-eqiad - [[phab:T261866|T261866]]
* 09:56 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - [[phab:T261866|T261866]]
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12480 and previous config saved to /var/cache/conftool/dbconfig/20200903-095510-marostegui.json
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12479 and previous config saved to /var/cache/conftool/dbconfig/20200903-095015-marostegui.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12478 and previous config saved to /var/cache/conftool/dbconfig/20200903-094857-marostegui.json
* 09:48 XioNoX: move VRRP master from cr1-eqiad:ae2.1118 to cr2-eqiad:xe-3/0/4.1118 - [[phab:T261866|T261866]]
* 09:46 XioNoX: move vlan 1118 IPv4 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - [[phab:T261866|T261866]]
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12477 and previous config saved to /var/cache/conftool/dbconfig/20200903-094435-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12476 and previous config saved to /var/cache/conftool/dbconfig/20200903-094043-marostegui.json
* 09:38 XioNoX: move vlan 1118 IPv6 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - [[phab:T261866|T261866]]
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12475 and previous config saved to /var/cache/conftool/dbconfig/20200903-093629-marostegui.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12474 and previous config saved to /var/cache/conftool/dbconfig/20200903-093454-marostegui.json
* 09:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12473 and previous config saved to /var/cache/conftool/dbconfig/20200903-092549-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 db2087:3317Β  [[phab:T261917|T261917]]', diff saved to https://phabricator.wikimedia.org/P12472 and previous config saved to /var/cache/conftool/dbconfig/20200903-092028-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12471 and previous config saved to /var/cache/conftool/dbconfig/20200903-091834-marostegui.json
* 09:13 XioNoX: rolled back: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - [[phab:T261866|T261866]]
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2122', diff saved to https://phabricator.wikimedia.org/P12470 and previous config saved to /var/cache/conftool/dbconfig/20200903-090901-marostegui.json
* 09:06 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - [[phab:T261866|T261866]]
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P12469 and previous config saved to /var/cache/conftool/dbconfig/20200903-090419-marostegui.json
* 09:01 XioNoX: force ae2.1118 VRRP master on cr1-eqiad - [[phab:T261866|T261866]]
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317, db1098:3316', diff saved to https://phabricator.wikimedia.org/P12468 and previous config saved to /var/cache/conftool/dbconfig/20200903-090007-marostegui.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3317', diff saved to https://phabricator.wikimedia.org/P12467 and previous config saved to /var/cache/conftool/dbconfig/20200903-085838-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12466 and previous config saved to /var/cache/conftool/dbconfig/20200903-085708-marostegui.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12465 and previous config saved to /var/cache/conftool/dbconfig/20200903-084910-marostegui.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P12464 and previous config saved to /var/cache/conftool/dbconfig/20200903-084836-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317, db1090:3312', diff saved to https://phabricator.wikimedia.org/P12463 and previous config saved to /var/cache/conftool/dbconfig/20200903-084358-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12462 and previous config saved to /var/cache/conftool/dbconfig/20200903-084147-marostegui.json
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 [[phab:T261917|T261917]]', diff saved to https://phabricator.wikimedia.org/P12461 and previous config saved to /var/cache/conftool/dbconfig/20200903-082956-marostegui.json
* 08:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:28 moritzm: rebooting mwmaint1002 for kernel update
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12460 and previous config saved to /var/cache/conftool/dbconfig/20200903-082655-marostegui.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12459 and previous config saved to /var/cache/conftool/dbconfig/20200903-082034-marostegui.json
* 08:16 marostegui: Upgrade db1101 (s7 and s8)
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12458 and previous config saved to /var/cache/conftool/dbconfig/20200903-081543-marostegui.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1101:3317', diff saved to https://phabricator.wikimedia.org/P12457 and previous config saved to /var/cache/conftool/dbconfig/20200903-081503-marostegui.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12456 and previous config saved to /var/cache/conftool/dbconfig/20200903-081337-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12455 and previous config saved to /var/cache/conftool/dbconfig/20200903-080714-marostegui.json
* 08:06 marostegui: Upgrade and reboot db1127
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12454 and previous config saved to /var/cache/conftool/dbconfig/20200903-080634-marostegui.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12453 and previous config saved to /var/cache/conftool/dbconfig/20200903-080024-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12452 and previous config saved to /var/cache/conftool/dbconfig/20200903-075443-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12451 and previous config saved to /var/cache/conftool/dbconfig/20200903-074922-marostegui.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 [[phab:T261917|T261917]]', diff saved to https://phabricator.wikimedia.org/P12450 and previous config saved to /var/cache/conftool/dbconfig/20200903-074827-marostegui.json
* 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 07:45 marostegui: Upgrade and reboot db1094
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2121 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12449 and previous config saved to /var/cache/conftool/dbconfig/20200903-074426-marostegui.json
* 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12448 and previous config saved to /var/cache/conftool/dbconfig/20200903-073718-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12447 and previous config saved to /var/cache/conftool/dbconfig/20200903-073116-marostegui.json
* 07:29 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12446 and previous config saved to /var/cache/conftool/dbconfig/20200903-072716-marostegui.json
* 07:24 hashar: contint2001: restarting CI Jenkins forΒ  plugins upgrade
* 07:19 marostegui: Deploy schema change on s8 eqiad master [[phab:T237120|T237120]]
* 07:18 marostegui: Stop slave on s8 eqiad master (lag will appear on s8 eqiad) - [[phab:T237120|T237120]]
* 07:02 marostegui: Stop db2100:3317 and db2121 in sync to reload metawiki.content [[phab:T261869|T261869]]
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12445 and previous config saved to /var/cache/conftool/dbconfig/20200903-070104-marostegui.json
* 06:56 hashar: contint2001: restarting CI Jenkins
* 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 06:56 _joe_: deployment of mobileapps to pick up changes to envoy config, new helmfile layout
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2120 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12444 and previous config saved to /var/cache/conftool/dbconfig/20200903-065105-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12443 and previous config saved to /var/cache/conftool/dbconfig/20200903-064804-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12442 and previous config saved to /var/cache/conftool/dbconfig/20200903-064623-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12441 and previous config saved to /var/cache/conftool/dbconfig/20200903-064334-marostegui.json
* 06:24 marostegui: Disconnect eqiad -> codfw replication
Β 
== 2020-09-02 ==
* 22:55 shdubsh: restart rsyslog on centrallog[12]001
* 22:27 ryankemper: `sudo cumin -b10 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki> and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
* 22:26 ryankemper: Puppet finished on all external wdqs codfw nodes, nginx automatically reloaded as intended
* 22:24 ryankemper: `sudo cumin -b10 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki> and not A:wdqs-test and not A:wdqs-internal' "sudo run-puppet-agent"`
* 21:48 bd808@deploy1001: Finished deploy [striker/deploy@3c2090a]: Deploying r20200902 tag ([[phab:T198114|T198114]], [[phab:T223610|T223610]], [[phab:T245804|T245804]], [[phab:T144111|T144111]], [[phab:T261810|T261810]]) (duration: 01m 34s)
* 21:46 bd808@deploy1001: Started deploy [striker/deploy@3c2090a]: Deploying r20200902 tag ([[phab:T198114|T198114]], [[phab:T223610|T223610]], [[phab:T245804|T245804]], [[phab:T144111|T144111]], [[phab:T261810|T261810]])
* 21:10 ryankemper: `sudo cumin -b10 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki> and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
* 21:10 ryankemper: `sudo cumin -b10 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki> and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart nginx.service"`
* 21:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:01 ryankemper: Restarted nginx on `wdqs2007`
* 21:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:47 ryankemper: restarted blazegraph on `wdqs2001` as well
* 20:46 ryankemper: `sudo cumin -b10 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki> and not A:wdqs-test and not A:wdqs-internal and not P<nowiki>{</nowiki>wdqs2001.codfw.wmnet<nowiki>}</nowiki>' "sudo systemctl restart wdqs-blazegraph.service"` (restarted everything but 2001, will restart 2001 next)
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:20 robh: scs-c1-eqiad firmware update complete and back online [[phab:T238036|T238036]]
* 19:14 robh: updating firmware on scs-c1-eqiad via [[phab:T238036|T238036]]
* 19:14 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update [[phab:T250887|T250887]] mitigations" (duration: 00m 32s)
* 18:58 herron: freeing some disk space on centrallog1001 with 'tune2fs -m 0 /dev/centrallog1001-vg/data'
* 18:43 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled, ouch, forgot to rebase (duration: 00m 55s)
* 18:40 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled (duration: 00m 55s)
* 18:38 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka jumbo-eqiad (for consistency with main) - [[phab:T261865|T261865]]
* 18:37 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-codfw - [[phab:T261865|T261865]]
* 18:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:622897 Install OAuthRateLimiter extension II: Add flag to IS (duration: 00m 56s)
* 18:34 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-eqiad - [[phab:T261865|T261865]]
* 18:33 ppchelko@deploy1001: Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 54s)
* 18:32 ottomata: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka jumbo-eqiad (for consistency with main) - [[phab:T261865|T261865]]
* 18:28 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport [[gerrit:623561{{!}}Fix parsing localised digits in PHP discussion parser]] (duration: 00m 56s)
* 18:19 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport [[gerrit:623560{{!}}Re-apply new reply API patches (again)]] (duration: 00m 58s)
* 17:34 bstorm: re-enabled puppet on labsdb10[09-12]
* 17:28 bstorm: disabled puppet on labsdb10[09-12]
* 17:18 herron: restarted elasticsearch on logstash1012
* 16:39 Pchelolo: creating oauth_ratelimit_client_tier table [[phab:T258711|T258711]]
* 15:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
* 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
* 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
* 15:32 hnowlan: Temporarily disabling apache for configuration change [[phab:T246945|T246945]]
* 15:24 godog: prometheus codfw lvextend --resizefs --size +50G /dev/mapper/vg--ssd-prometheus--k8s
* 15:19 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 15:18 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
* 15:18 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
* 15:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:16 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
* 15:15 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
* 15:15 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main
* 15:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:31 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main eqiad - [[phab:T261865|T261865]]
* 14:29 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main codfw - [[phab:T261865|T261865]]
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 [[phab:T261869|T261869]]', diff saved to https://phabricator.wikimedia.org/P12434 and previous config saved to /var/cache/conftool/dbconfig/20200902-141854-marostegui.json
* 13:05 elukey: run kafka preferred-replica-election on kafka-main codfw
* 12:07 XioNoX: move vrrp master from cr2-codfw to cr1-codfw
* 11:52 duesen__: daniel@mwmaint2001:/srv/mediawiki/php-1.36.0-wmf.6$ mwscript findBadBlobs.php testwiki --mark [[phab:T251778|T251778]]
* 11:36 Urbanecm: EU B&C done
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|796b4fa8d561986a20ad5c9671b696809fa09b67}}: Add title for apiportalwiki ([[phab:T246945|T246945]]) (duration: 00m 56s)
* 11:34 Urbanecm: Fetched extra commits to deploy1001's stagging dir, commit messages explains it's an accident, continuing; cc Krinkle
* 11:31 duesen__: Deployed second security fix for [[phab:T260485|T260485]]
* 11:07 XioNoX: repool cr1-eqiad
* 10:58 XioNoX: cr1-eqiad:request chassis routing-engine master switch
* 10:49 XioNoX: reboot cr1-eqiad:re0 (backup)
* 10:45 jbond42: install apache updates on buster
* 10:36 XioNoX: cr1-eqiad:request chassis routing-engine master switch
* 10:35 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
* 10:34 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
* 10:32 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
* 10:31 jbond42: install apache updates on jessie
* 10:27 XioNoX: reboot cr1-eqiad:re1 (backup)
* 10:18 XioNoX: move VRRP master from cr1 to cr2
* 10:16 XioNoX: drain cr1-eqiad transit/transport/IX
* 10:13 XioNoX: drain cr1-eqiad-pfw3-eqiad link
* 10:04 XioNoX: repool cr2-eqiad
* 09:55 XioNoX: cr2-eqiad:request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:46 XioNoX: reboot cr2-eqiad:re0 (backup) - [[phab:T259621|T259621]]
* 09:28 XioNoX: cr2-eqiad:request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:18 XioNoX: reboot cr2-eqiad:re1 (backup) - [[phab:T259621|T259621]]
* 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
* 09:12 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:11 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 09:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
* 09:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
* 09:01 elukey: reimage kafka-jumbo1004 to Buster
* 08:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from s10 - [[phab:T260324|T260324]]', diff saved to https://phabricator.wikimedia.org/P12432 and previous config saved to /var/cache/conftool/dbconfig/20200902-085705-marostegui.json
* 08:52 XioNoX: deactivate cr2-eqiad transit/IX - [[phab:T259621|T259621]]
* 08:50 XioNoX: drain cr2-eqiad transport links - [[phab:T259621|T259621]]
* 08:20 XioNoX: activate Telia BGP in eqiad
* 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 07:38 elukey: reimage kafka-jumbo1003 to buster
* 07:28 marostegui: Reboot dbstore1003 for kernel upgrade - [[phab:T261389|T261389]]
* 07:12 XioNoX: configure cr2-eqiad:ae5 as single LACP link to Telia
* 07:05 marostegui: Drop unused grants on m5 [[phab:T261152|T261152]]
* 07:02 elukey: reboot kafka-jumbo1002 to pick up new kernel settings
* 07:00 XioNoX: deactivate Telia BGP in eqiad
* 06:38 elukey: powercycle analytics1059 - cpu soft locks on multiple CPUs
* 06:30 elukey: reboot kafka-jumbo1001 to pick up new kernel settings
* 06:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
Β 
== 2020-09-01 ==
* 22:39 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=sysop_itwiki Pierpao ([[phab:T261722|T261722]])
* 17:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:36 ryankemper: wdqs [canary] rollback complete, tests passing now. Will need to dig into source of failure
* 17:35 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@7920fbe]: 0.3.46 (duration: 03m 43s)
* 17:35 ryankemper: `wdqs1003` (the canary instance) is failing tests now, going to rollback
* 17:32 ryankemper@deploy1001: Started deploy [wdqs/wdqs@7920fbe]: 0.3.46
* 17:30 ryankemper: Starting wdqs deploy
* 15:56 chasemp: labsdb* puppet agent --test; sudo /usr/local/sbin/maintain-views --all-databases --table user --replace-all; sudo /usr/local/sbin/maintain-views --all-databases --table user_old --replace-all
* 15:25 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:28 _joe_: restarting envoy on all eqiad jobrunners
* 14:22 _joe_: restarted confd on mwmaint1002
* 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 14:18 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2083 weight', diff saved to https://phabricator.wikimedia.org/P12429 and previous config saved to /var/cache/conftool/dbconfig/20200901-141521-marostegui.json
* 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:07 rzl@cumin1001: MediaWiki read-only period ends at: 2020-09-01 14:07:36.305500
* 14:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:02 rzl@cumin1001: MediaWiki read-only period starts at: 2020-09-01 14:02:04.851006
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 13:58 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 13:58 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 13:51 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:45 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:44 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 13:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 10:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:48 XioNoX: reserve cr2-eqiad:xe-3/3/7 for new Telia port
* 09:38 jayme: systemctl restart docker-reporter-releng-images.service on deneb to clear out alert because of temporary HTTP 504 from debmonitor
* 09:01 moritzm: installing Java 8 sec updates on contint*
* 08:51 moritzm: uploaded apache 2.4.10-10+deb8u16+wmf1 for jessie-wikimedia
* 07:11 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
* 07:05 moritzm: restarting jenkins on releases1002 to pick up Java security updates
* 06:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 06:44 elukey: reimage kafka-jumbo1002 to Buster
* 06:20 marostegui: Install query killers on db2137:3314 [[phab:T243373|T243373]]
* 01:17 chaomodus: updated the pynetbox package to 5.0.7 and uploaded to buster
* 00:02 mutante: wb2-grrrri was not running and wikibugs had no more Gerrit updates since a while
* 00:01 mutante: restarting wikibugs
Β 
== 2020-08-31 ==
* 23:38 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final) (duration: 00m 17s)
* 23:38 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final)
* 23:37 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001 (duration: 01m 12s)
* 23:36 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001
* 23:36 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001 (duration: 00m 58s)
* 23:35 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001
* 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2 (duration: 00m 05s)
* 23:31 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2
* 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next (duration: 00m 57s)
* 23:30 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next
* 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable (future) mw-reverted tag for all wikis except testwiki ([[phab:T254074|T254074]]) (duration: 00m 57s)
* 21:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:20 ryankemper: `sudo systemctl restart elasticsearch_6@production-search-psi-eqiad.service` on `elastic1052.eqiad.wmnet`
* 18:38 Urbanecm: Morning B&C done
* 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|16197aabc88f098568a04984a20149de3b7fdeaf}}: Add two domains to wgCopyUploadsDomains for commonswiki ([[phab:T261562|T261562]]; [[phab:T261575|T261575]]) (duration: 00m 54s)
* 18:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb28e9da8057a4c92cd4d564ffd000f320338cda}}: itwiki: Assign patrol right to autopatrolled instead of autoconfirmed ([[phab:T261587|T261587]]) (duration: 00m 53s)
* 18:23 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|a1b0d6e4e7da9bf45ae7381d2c1d9814e6b36498}}: {{Gerrit|b609cd53273e922cd8af5507660b9d10c6da09b3}}: CommonSettings.php: limit new Echos `push-subscription-manager` group to Meta-Wiki ([[phab:T261625|T261625]]) (duration: 00m 54s)
* 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|846c5448f950b4d0d7eedce570e46d74ca62ca38}}: wgEventStreams: Stream for MEP-iOS pilot ([[phab:T260382|T260382]]) (duration: 00m 55s)
* 17:21 volans: uploaded spicerack_0.0.42 to apt.wikimedia.org buster-wikimedia
* 15:50 rzl@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 15:49 ejegg: updated payments-wiki from {{Gerrit|ef7ebd08cb}} to {{Gerrit|be81063168}}
* 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=99)
* 15:32 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 14:58 ema: Traffic: depool eqiad from user traffic [[phab:T243316|T243316]]
* 14:38 moritzm: installing rake security updates on stretch
* 14:33 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:21 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 14:20 rzl@cumin1001: Switching services apertium, termbox, search, api-gateway, ores, sessionstore, eventgate-main, graphoid, eventstreams, wikifeeds, wdqs, parsoid, eventgate-logging-external, wdqs-internal, echostore, mathoid, mobileapps, proton, restbase, kartotherian, recommendation-api, eventgate-analytics-external, restbase-async, citoid, schema, cxserver, eventgate-analytics, zotero: eqiad => codfw
* 14:20 rzl@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
* 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 14:13 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:12 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=99)
* 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 13:41 andrewbogott: dropping many databases from m5, as per [[phab:T261152|T261152]]
* 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:07 marostegui: Failover m3 (phabricator) proxy from dbproxy1016 to dbproxy1020 - [[phab:T261459|T261459]]
* 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:54 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 12:54 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 12:53 oblivian@cumin2001: Switching services parsoid: eqiad => codfw
* 12:53 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
* 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 12:48 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 12:45 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 12:45 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 12:44 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 12:44 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
* 12:44 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
* 12:43 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 12:37 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 12:14 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 12:14 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 12:13 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 12:13 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
* 12:13 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
* 12:10 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 12:05 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 11:58 elukey: reimage kafka-jumbo1001 to Buster
* 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: {{Gerrit|5d583d9550787a8e36c29ca841233615405fcb7e}}: Disable MediaSearch A/B test (duration: 00m 55s)
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f88fde2aad23a619047b1177a6188f51df11a9}}: Enable Signature button on Wikiproject for hywiki ([[phab:T261550|T261550]]) (duration: 00m 54s)
* 11:22 jbond42: removing old hiera version 1 and 3 backends
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b74893fecdaae599077daad5b1219ad3b9bc7fc9}}: Enable sitenotice on mobile for closed wikis ([[phab:T261357|T261357]]) (duration: 00m 56s)
* 11:02 volans: upgraded spicerack to 0.0.41 on cumin hosts
* 10:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:51 elukey: executed /srv/phab/phabricator/bin/remove destroy @klausman on phab1001 (following https://wikitech.wikimedia.org/wiki/Phabricator#Delete_a_user) to clear incosistent state of new account (wrong email address)
* 08:43 moritzm: installing bind9 security updates on stretch/buster (client-side tools/libs only)
* 07:53 volans: uploaded spicerack_0.0.41 to apt.wikimedia.org buster-wikimedia
* 07:30 moritzm: installing squid security updates
* 07:24 moritzm: installing openexr security updates on buster
* 07:12 marostegui: Sanitize jawikivoyageΒ  on db2094:3325 and db1124:3325 [[phab:T260482|T260482]]
* 06:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 06:06 elukey: reimage kafka-jumbo1005 to Debian Buster
* 05:21 marostegui: Reload haproxy on dbproxy1017 and dbproxy1021 to test db1128
Β 
== 2020-08-30 ==
* 16:13 herron: restarted eqiad v5 logstashes
Β 
== 2020-08-29 ==
* 18:05 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T261451|T261451]])
* 17:45 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T261451|T261451]])
Β 
== 2020-08-28 ==
* 21:53 ryankemper: `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues
* 19:11 andrewbogott: rebooting cloudvirt1006.Β  It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving
* 17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:39 mutante: shutting down mw2196
* 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:40 rzl: switchdc live test complete
* 16:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 16:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 16:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 16:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 16:33 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 16:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 16:29 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463
* 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 16:28 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663
* 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 16:19 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 16:19 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 16:13 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 16:12 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 16:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 16:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 16:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 16:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 16:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 16:06 rzl: starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise
* 14:22 moritzm: installing Java security updates on kafka/main and Logstash(5) clusters
* 13:35 hashar@deploy1001: Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s)
* 13:35 hashar@deploy1001: Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md
* 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:07 elukey: stop kafka on kafka-jumbo1006 and reimage to buster
* 12:56 moritzm: installing debmonitor1002 [[phab:T261492|T261492]]
* 12:46 moritzm: installing debmonitor2002 [[phab:T261492|T261492]]
* 11:50 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:27 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 09:48 jayme: updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy*
* 09:19 jayme: imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia
* 08:22 kormat: enabling replication from db2112 to db1083 (s1) [[phab:T243373|T243373]]
* 07:41 jynus: restart backup2001,backup1002
* 07:10 jynus: restart db2139
* 07:07 marostegui: Warm up parsercache in codfw - [[phab:T260042|T260042]]
* 06:47 jynus: restart db2102
* 06:28 jynus: restart db2100
* 06:07 jynus: restart db2099
* 05:50 jynus: restart db2098
* 00:06 eileen: process-control config revision is {{Gerrit|dd541a25dc}}
Β 
== 2020-08-27 ==
* 23:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:48 eileen: civicrm revision changed from {{Gerrit|a942537984}} to {{Gerrit|3d501e71d9}}, config revision is {{Gerrit|dd541a25dc}}
* 22:54 eileen: civicrm revision changed from {{Gerrit|481ab742db}} to {{Gerrit|a942537984}}, config revision is {{Gerrit|e2ab4d7c1f}}
* 22:28 tzatziki: removing one file for legal compliance
* 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:18 volans: uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:22 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:10 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet
* 20:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet
* 20:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - [[phab:T259714|T259714]] (duration: 00m 55s)
* 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:57 marxarelli: 1.36.0-wmf.6 promoted to all wikis ([[phab:T257974|T257974]]). new errors appear to be related to [[phab:T261345|T261345]] but are known since 1.36.0-wmf.5
* 19:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet
* 19:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
* 19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 19:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 19:16 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating apiportalwiki ([[phab:T246945|T246945]])
* 19:15 urbanecm@deploy1001: Synchronized dblists: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 19:14 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 19:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 19:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki ([[phab:T246945|T246945]]) (duration: 01m 03s)
* 18:54 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s)
* 18:54 mforns@deploy1001: Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
* 18:53 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s)
* 18:43 mforns@deploy1001: Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
* 18:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s)
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki ([[phab:T257490|T257490]]) (duration: 01m 03s)
* 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary
* 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary
* 18:16 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:16 cdanis@cumin1001: START - Cookbook sre.network.cf
* 18:14 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet
* 18:07 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver
* 18:06 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver
* 18:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver
* 18:03 Urbanecm: Creating jawikivoyage is done ([[phab:T260320|T260320]])
* 18:02 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
* 18:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver
* 18:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage ([[phab:T260320|T260320]]) (duration: 01m 02s)
* 17:59 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary
* 17:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet
* 17:59 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating jawikivoyage ([[phab:T260320|T260320]]) (duration: 01m 03s)
* 17:58 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating jawikivoyage ([[phab:T260320|T260320]])
* 17:57 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet
* 17:56 urbanecm@deploy1001: Synchronized dblists: Creating jawikivoyage ([[phab:T260320|T260320]]) (duration: 00m 58s)
* 17:56 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary
* 17:55 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet
* 17:55 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating jawikivoyage ([[phab:T260320|T260320]]) (duration: 01m 03s)
* 17:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet
* 17:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet
* 17:54 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage ([[phab:T260320|T260320]]) (duration: 01m 07s)
* 17:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet
* 17:52 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet
* 17:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet
* 17:50 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet
* 17:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet
* 17:47 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet
* 17:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet
* 17:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet
* 17:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet
* 17:31 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary
* 17:30 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet
* 17:29 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:29 cdanis@cumin1001: START - Cookbook sre.network.cf
* 17:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:17 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet
* 17:13 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet
* 17:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:11 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet
* 17:04 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet
* 17:01 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet
* 16:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet
* 16:54 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet
* 16:49 mutante: re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 ([[phab:T261159|T261159]])
* 16:48 mutante: depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 ([[phab:T260654|T260654]])
* 16:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet
* 16:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet
* 16:35 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:21 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary
* 16:14 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet
* 16:12 elukey: remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744)
* 16:09 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary
* 16:08 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet
* 16:06 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet
* 16:05 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet
* 15:51 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet
* 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary
* 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary
* 15:41 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet
* 15:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet
* 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet
* 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet
* 14:48 moritzm: installing Java security updates on aqs, hadoop and kafka-jumbo
* 14:44 moritzm: restarting tomcat on idp-test* hosts to pick up Java update
* 14:42 elukey: add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705)
* 14:37 moritzm: imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update)
* 14:31 papaul: replacing msw-c5,c6,c7 and fmsw-c8
* 13:58 kormat: disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) [[phab:T243373|T243373]]
* 13:56 kormat: disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) [[phab:T243373|T243373]]
* 13:54 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:53 kormat: disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) [[phab:T243373|T243373]]
* 13:52 kormat: disabling GTID on db2123 (s5) [[phab:T243373|T243373]]
* 13:52 kormat: disabling GTID on db2090 (s4) [[phab:T243373|T243373]]
* 13:51 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:51 kormat: disabling GTID on db2105 (s3) [[phab:T243373|T243373]]
* 13:50 kormat: disabling GTID on db2107 (s2) [[phab:T243373|T243373]]
* 13:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:29 elukey: restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries)
* 13:18 kormat: enabling replication from db2107 to db1122 (s2) [[phab:T243373|T243373]]
* 13:14 kormat: enabling replication from db2096 to db1103 (x1) [[phab:T243373|T243373]]
* 13:10 jynus: restart db2097
* 13:07 jbond42: deploy python3.4 security update to kraz
* 13:03 jbond42: deploy python3.4 security update to canaries on jessie
* 13:01 kormat: enabling replication from db2118 to db1086 (s7) [[phab:T243373|T243373]]
* 12:52 jynus: restart db1140
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12402 and previous config saved to /var/cache/conftool/dbconfig/20200827-124338-marostegui.json
* 12:35 jynus: restart db1139
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7Β  weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12401 and previous config saved to /var/cache/conftool/dbconfig/20200827-123028-marostegui.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7Β  weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12400 and previous config saved to /var/cache/conftool/dbconfig/20200827-123003-marostegui.json
* 12:24 marostegui: Fix password format for in db2129 (s6 codfw master) [[phab:T243373|T243373]]
* 12:14 kormat: enabling replication from db2129 to db1093 (s6) [[phab:T243373|T243373]]
* 12:13 jynus: restart db1095
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6Β  weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12399 and previous config saved to /var/cache/conftool/dbconfig/20200827-120816-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 codfw weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12398 and previous config saved to /var/cache/conftool/dbconfig/20200827-120211-marostegui.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 eqiad weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12397 and previous config saved to /var/cache/conftool/dbconfig/20200827-115934-marostegui.json
* 11:56 Urbanecm: Lift range blocks exceeding wgBlockCIDRLimit via custom script from F32197596 (ruwiki, ruwikiquote; [[phab:T243980|T243980]])
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s4 codfw weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12396 and previous config saved to /var/cache/conftool/dbconfig/20200827-115110-marostegui.json
* 11:49 moritzm: uploaded python3.4 3.4.2-1+deb8u7+wmf1 for jessie-wikimedia [[phab:T259102|T259102]]
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 codfw weights [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12395 and previous config saved to /var/cache/conftool/dbconfig/20200827-114509-marostegui.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2126 weight [[phab:T243373|T243373]]', diff saved to https://phabricator.wikimedia.org/P12394 and previous config saved to /var/cache/conftool/dbconfig/20200827-112213-marostegui.json
* 11:12 Urbanecm: EU B&C done
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|34994d39f92b23934929c66f3e15aa332683e746}}: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki ([[phab:T131300|T131300]]) (duration: 01m 03s)
* 10:57 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:56 godog: bounce grafana to apply new settings
* 10:51 kormat: enabling replication from db2123 to db1100 (s5) [[phab:T243373|T243373]]
* 10:48 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:30 kormat: enabling replication from es2023 to es1024 (es5) [[phab:T243373|T243373]]
* 10:28 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:23 kormat: enabling replication from es2021 to es1021 (es4) [[phab:T243373|T243373]]
* 10:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:03 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:54 moritzm: installing Java security updates on IDP* hosts
* 09:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:43 elukey: decommissioning vms schema[12]00[12] (replaced previously by schema[12]00[34] buster vms)
* 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:41 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:20 kormat: enabling replication from db2105 to db1123 (s3) [[phab:T243373|T243373]]
* 09:15 kormat: enabling replication from db2079 to db1109 (s8) [[phab:T243373|T243373]]
* 09:07 kormat: enabling replication from db2090 to db1081 (s4) [[phab:T243373|T243373]]
* 08:53 kormat: enabling replication from pc2009 to pc1009 (pc3) [[phab:T243373|T243373]]
* 08:44 kormat: enabling replication from pc2008 to pc1008 (pc2) [[phab:T243373|T243373]]
* 08:13 marostegui: Enable replication codfw -> eqiad on pc1 [[phab:T243373|T243373]]
* 08:01 gehel: manual cleanup of stale wdqs deploy crontab on wdqs1009
* 07:35 marostegui: Move pc2010 under pc2007 [[phab:T243373|T243373]]
* 07:16 moritzm: installing ghostscript security updates on stretch
* 06:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 06:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 06:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 06:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12392 and previous config saved to /var/cache/conftool/dbconfig/20200827-060652-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12391 and previous config saved to /var/cache/conftool/dbconfig/20200827-055815-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12390 and previous config saved to /var/cache/conftool/dbconfig/20200827-055522-marostegui.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12389 and previous config saved to /var/cache/conftool/dbconfig/20200827-055126-marostegui.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12388 and previous config saved to /var/cache/conftool/dbconfig/20200827-055104-marostegui.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12387 and previous config saved to /var/cache/conftool/dbconfig/20200827-054259-marostegui.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 db1085 db1078', diff saved to https://phabricator.wikimedia.org/P12386 and previous config saved to /var/cache/conftool/dbconfig/20200827-054114-marostegui.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12385 and previous config saved to /var/cache/conftool/dbconfig/20200827-053814-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12384 and previous config saved to /var/cache/conftool/dbconfig/20200827-053558-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12383 and previous config saved to /var/cache/conftool/dbconfig/20200827-053509-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12382 and previous config saved to /var/cache/conftool/dbconfig/20200827-053100-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P12381 and previous config saved to /var/cache/conftool/dbconfig/20200827-052925-marostegui.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P12380 and previous config saved to /var/cache/conftool/dbconfig/20200827-052818-marostegui.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P12379 and previous config saved to /var/cache/conftool/dbconfig/20200827-052413-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12378 and previous config saved to /var/cache/conftool/dbconfig/20200827-051609-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12377 and previous config saved to /var/cache/conftool/dbconfig/20200827-051546-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12376 and previous config saved to /var/cache/conftool/dbconfig/20200827-050754-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P12375 and previous config saved to /var/cache/conftool/dbconfig/20200827-050727-marostegui.json
* 04:53 marostegui: Stop db1074 and db2107 in sync to fix drifts on s2 change_tag - [[phab:T260042|T260042]]
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P12374 and previous config saved to /var/cache/conftool/dbconfig/20200827-045329-marostegui.json
* 04:04 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1006.wikimedia.org
* 04:03 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1005.wikimedia.org
* 04:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cloudelastic1005.wikimedia.org
* 02:03 mutante: shutting down install3001,install4001,install5001 VMs (no OS yet, but please also don't delete, debugging in progress, shutting them down until I continue on [[phab:T254157|T254157]])
Β 
== 2020-08-26 ==
* 23:35 eileen: civicrm revision changed from {{Gerrit|d2e80f7522}} to {{Gerrit|481ab742db}}, config revision is {{Gerrit|e2ab4d7c1f}}
* 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 22:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:51 XioNoX: standardize pfw3-eqiad
* 19:33 marxarelli: 1.36.0-wmf.6 promoted to group1 ([[phab:T257974|T257974]]). logs show no new errors
* 19:24 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.6 (duration: 01m 03s)
* 19:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.6
* 18:21 Urbanecm: Morning B&C done
* 18:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|945b97cff8b8a1e4bb43b613fc93b099f74945f7}}: Added import sources for mlwiktionary ([[phab:T260716|T260716]]) (duration: 01m 05s)
* 18:12 Urbanecm: Purge Thai and Greek taglines, URLs are at P12372Β  ([[phab:T258552|T258552]])
* 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40092898d8c70191324e844d2c222469b954e9ef}}: Update Thai and Greek taglines ([[phab:T258552|T258552]]) (duration: 01m 03s)
* 18:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|40092898d8c70191324e844d2c222469b954e9ef}}: Update Thai and Greek taglines ([[phab:T258552|T258552]]) (duration: 01m 05s)
* 18:08 herron: upgraded eqiad elk v7 cluster from 7.8.0 to 7.9.0 [[phab:T234854|T234854]]
* 18:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client side error logging on hewiki ([[phab:T255585|T255585]]) (duration: 01m 04s)
* 17:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Documentation-only change; sync for line sanity (duration: 01m 04s)
* 17:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T254349|T254349]] Set wgVisualEditorEnableBetaFeature true on wikis that need it (duration: 01m 03s)
* 15:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 15:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 15:41 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 15:11 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for MCR change', diff saved to https://phabricator.wikimedia.org/P12371 and previous config saved to /var/cache/conftool/dbconfig/20200826-145612-marostegui.json
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12370 and previous config saved to /var/cache/conftool/dbconfig/20200826-145531-marostegui.json
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12369 and previous config saved to /var/cache/conftool/dbconfig/20200826-144750-marostegui.json
* 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1002.eqiad.wmnet
* 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1001.eqiad.wmnet
* 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2001.codfw.wmnet
* 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2002.codfw.wmnet
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12368 and previous config saved to /var/cache/conftool/dbconfig/20200826-143623-marostegui.json
* 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2003.codfw.wmnet
* 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2004.codfw.wmnet
* 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1004.eqiad.wmnet
* 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1003.eqiad.wmnet
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12367 and previous config saved to /var/cache/conftool/dbconfig/20200826-142746-marostegui.json
* 14:25 jgleeson: updated civicrm from {{Gerrit|0f195c6cca}} to {{Gerrit|d2e80f7522}}
* 14:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:20 marostegui: Upgrade mysql on db1091 after MCR changes
* 14:13 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:37 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 100% [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json
* 13:18 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php dewiki --mark [[phab:T205936|T205936]] --revisions - < ~/T205936-dewiki-20050512070000.idsΒ  # marking known bad revisions for [[phab:T205936|T205936]]
* 13:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 75% [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json
* 13:16 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php oswiki --mark [[phab:T205936|T205936]] --revisions - < ~/T205936-oswiki-20090309200000.ids # marking known bad revisions for [[phab:T205936|T205936]]
* 13:07 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 50% [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json
* 13:06 vgutierrez: serve a synthetic warn page to DHE-RSA-AES128-SHA users - [[phab:T258405|T258405]]
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 30% [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json
* 12:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 20% [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json
* 12:12 godog: upgrade nagios-nrpe-server to 2.15-2 on jessie hosts - [[phab:T261198|T261198]]
* 11:58 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1110 [[phab:T261276|T261276]]', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json
* 11:56 mlitn@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 00s)
* 11:55 mlitn@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 08s)
* 11:53 kart_: Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002Β  ([[phab:T261189|T261189]])
* 11:39 kart_: Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002Β  ([[phab:T261189|T261189]])
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:599344{{!}}Enable propagateChangeVisibility for testwikidata]], part 2 (duration: 01m 03s)
* 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:599344{{!}}Enable propagateChangeVisibility for testwikidata]], part 1 (duration: 01m 19s)
* 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 XioNoX: re-enable IPv6 BGP to Init7 in knams
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 replication broken', diff saved to https://phabricator.wikimedia.org/P12360 and previous config saved to /var/cache/conftool/dbconfig/20200826-084044-marostegui.json
* 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for MCR change', diff saved to https://phabricator.wikimedia.org/P12358 and previous config saved to /var/cache/conftool/dbconfig/20200826-054557-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12357 and previous config saved to /var/cache/conftool/dbconfig/20200826-054409-marostegui.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12356 and previous config saved to /var/cache/conftool/dbconfig/20200826-053345-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12355 and previous config saved to /var/cache/conftool/dbconfig/20200826-052355-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12354 and previous config saved to /var/cache/conftool/dbconfig/20200826-050849-marostegui.json
* 05:03 marostegui: Update db1135 and db1114 after MCR changes
Β 
== 2020-08-25 ==
* 21:51 mutante: xhgui1001/xhgui2001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ([[phab:T260397|T260397]])
* 21:50 mutante: xhgui1001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ...
* 21:46 mutante: importing xhgui 0.12.0-2-wmf1 to buster-wikimedia APT repo ([[phab:T260397|T260397]])
* 19:40 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import (duration: 00m 54s)
* 19:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import
* 19:15 marxarelli: 1.36.0-wmf.6 promoted to group0 ([[phab:T257974|T257974]]). no new errors
* 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.6
* 19:05 moritzm: installing Java security updates on cloudelastic* hosts
* 19:02 moritzm: installing Java security updates on elastic* hosts
* 18:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 17:58 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.6 (duration: 41m 58s)
* 17:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import (duration: 01m 52s)
* 17:28 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import
* 17:17 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.6
* 17:08 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.4 (duration: 01m 40s)
* 17:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.3 (duration: 19m 12s)
* 17:01 herron: imported logstash, elasticsearch, and kibana 7.9.0 -oss packages into buster-wikimedia thirdparty/elastic79
* 16:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import (duration: 00m 49s)
* 16:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import
* 16:21 shdubsh: restart logstash on logstash1007 -- gc duration outlier
* 16:08 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import (duration: 00m 54s)
* 16:07 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import
* 16:00 gehel: repool wdqs1005 - catched up on lag
* 15:47 elukey: restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication)
* 15:44 jgleeson: fundraising-tools updated from {{Gerrit|dcad0bfe75}} to {{Gerrit|3fe3a23114}}
* 15:41 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import (duration: 01m 38s)
* 15:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import
* 15:22 liw: testing upcoming Scap release on beta
* 14:56 moritzm: installing rake security updates on stretch
* 14:56 moritzm: installing take security updates on stretch
* 14:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:32 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:Β  (duration: 00m 05s)
* 14:32 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 14:26 XioNoX: disable IPv6 BGP to Init7 in knams
* 14:10 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: add hostname checking --bug [[phab:T207538|T207538]] (duration: 03m 50s)
* 14:06 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: add hostname checking --bug [[phab:T207538|T207538]]
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for MCR change', diff saved to https://phabricator.wikimedia.org/P12347 and previous config saved to /var/cache/conftool/dbconfig/20200825-135248-marostegui.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'fully repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12346 and previous config saved to /var/cache/conftool/dbconfig/20200825-134736-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12345 and previous config saved to /var/cache/conftool/dbconfig/20200825-133734-marostegui.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12344 and previous config saved to /var/cache/conftool/dbconfig/20200825-132027-marostegui.json
* 13:17 moritzm: installing firejail security updates on remaining mw* servers in eqiad
* 12:56 godog: upgrade nagios-nrpe-server on scb2* and mwlog* - [[phab:T261198|T261198]]
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12343 and previous config saved to /var/cache/conftool/dbconfig/20200825-125108-marostegui.json
* 12:45 marostegui: Update MySQL on db1111 after MCR change
* 12:39 marostegui: alter table sites on s6, directly on the primary master [[phab:T260476|T260476]]
* 12:39 godog: test nagios-nrpe-server with dh 2048 on scb2001 - [[phab:T261198|T261198]]
* 12:35 moritzm: imported ceph packages from stretch-backports to component/ceph [[phab:T256877|T256877]]
* 12:10 moritzm: installing ruby-json security updates
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 MCR change', diff saved to https://phabricator.wikimedia.org/P12341 and previous config saved to /var/cache/conftool/dbconfig/20200825-120708-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12340 and previous config saved to /var/cache/conftool/dbconfig/20200825-120211-marostegui.json
* 11:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12339 and previous config saved to /var/cache/conftool/dbconfig/20200825-114938-marostegui.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12338 and previous config saved to /var/cache/conftool/dbconfig/20200825-113758-marostegui.json
* 11:36 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12337 and previous config saved to /var/cache/conftool/dbconfig/20200825-112859-marostegui.json
* 11:25 marostegui: Upgrade mysql on db1118 after MCR change
* 11:16 Urbanecm: EU B&C done
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d869e308492ee72cb3d1998b15409aa44a4af9c7}}: Enable ContentTranslation as a default tool in Assamese and Burmese WPs ([[phab:T258503|T258503]]; [[phab:T258505|T258505]]) (duration: 01m 00s)
* 10:59 moritzm: installing remaining libx11 security updates
* 10:37 arturo: import all binary packages from tesseract-ocr-lang into stretch-wikimedia/component/tesseract-410-bpo ([[phab:T247422|T247422]])
* 10:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:23 moritzm: removed fermium.wikimedia.org from debmonitor
* 09:45 marostegui: Create missing table cx_notification_log on x1 wikishared [[phab:T261190|T261190]]
* 08:50 XioNoX: re-activate eqord peering/transit - [[phab:T259593|T259593]]
* 08:19 XioNoX: reconfigure eqord to be AS65020 - [[phab:T259593|T259593]]
* 08:18 XioNoX: deactivate eqord peering/transit - [[phab:T259593|T259593]]
* 07:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
* 07:13 marostegui: Upgrade MySQL on dbstore1004
* 07:09 dcausse: depooling wdqs1005 (high lag)
* 07:04 dcausse: restartint blazegraph on wdqs1005 ([[phab:T242453|T242453]])
* 06:20 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111, db1118 for MCR change', diff saved to https://phabricator.wikimedia.org/P12336 and previous config saved to /var/cache/conftool/dbconfig/20200825-053856-marostegui.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12335 and previous config saved to /var/cache/conftool/dbconfig/20200825-053801-marostegui.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12334 and previous config saved to /var/cache/conftool/dbconfig/20200825-052602-marostegui.json
* 05:21 moritzm: installing Java security updates on relforge*
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12333 and previous config saved to /var/cache/conftool/dbconfig/20200825-051327-marostegui.json
* 05:11 marostegui: Remove revisions triggers from db2094:3311 [[phab:T238966|T238966]]
* 05:10 marostegui: Deploy MCR schema change on s1 codfw, this will create lag on s1 codfw - [[phab:T238966|T238966]]
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12332 and previous config saved to /var/cache/conftool/dbconfig/20200825-050451-marostegui.json
* 04:02 ejegg: updated fundraising python tools from {{Gerrit|305f2a4438}} to {{Gerrit|dcad0bfe75}}
* 01:49 eileen: civicrm revision changed from {{Gerrit|ce28723709}} to {{Gerrit|0f195c6cca}}, config revision is {{Gerrit|96839009f1}}
* 01:39 eileen: civicrm revision is {{Gerrit|ce28723709}}, config revision is {{Gerrit|96839009f1}}
* 01:30 eileen: civicrm revision is {{Gerrit|ce28723709}}, config revision is {{Gerrit|54c8c7abf2}}
* 01:17 cdanis: repool esams
* 01:11 cdanis: [[phab:T259621|T259621]] wrong junos version was staged on cr2-esams, abandoning this attempt and putting back in service
* 01:07 cdanis: cdanis@re0.cr2-esams> request system software add validate re1 /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz
* 00:56 cdanis: [[phab:T259621|T259621]] ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ homer 'cr*' commit 'drain cr2-esams transport link'
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request chassis routing-engine master switch
* 00:30 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request vmhost reboot re0
* 00:24 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0
* 00:18 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request chassis routing-engine master switch
* 00:14 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request vmhost reboot re1
* 00:08 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1
Β 
== 2020-08-24 ==
* 23:46 cdanis: depool esams [[phab:T259621|T259621]]
* 23:16 Urbanecm: Evening B&C window done
* 23:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|778f710bbbdb24730f7ce4c75d5ff1ca7a5ce3b3}}: Alternate configuration mechanism for Parsoid ([[phab:T241961|T241961]]) (duration: 00m 58s)
* 22:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 22:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission
* 21:29 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deployed additional mitigations for [[phab:T257687|T257687]] (duration: 00m 58s)
* 20:29 rzl: re-enabled puppet on 'R:File = /etc/nutcracker/nutcracker.yml' [[phab:T261154|T261154]]
* 19:25 rzl: disabling puppet on 'R:File = /etc/nutcracker/nutcracker.yml' to swap mc2028 out for mc2037 [[phab:T261154|T261154]]
* 18:10 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Increase weight of grants and research namespaces in metawiki search (duration: 00m 58s)
* 15:20 jynus: shutdown backup2001 [[phab:T260764|T260764]]
* 15:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:08 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:04 vgutierrez: rolling restart of ats-tls to disable ECDHE-RSA-AES128-SHA - [[phab:T258405|T258405]]
* 14:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:55 rzl: switchover test complete, puppet re-enabled on cumin1001
* 14:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 14:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:53 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:52 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:48 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:47 godog: powercycle ganeti5002 -- host down and nothing in console
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:43 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-24 14:43:35.570234
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:42 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
* 14:42 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:42 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:41 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-24 14:41:55.754938
* 14:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 14:41 dcausse: creating cirrus indices for lldwiki
* 14:39 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:24 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:24 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:22 moritzm: installing libexif security updates on stretch
* 14:18 rzl: disabling puppet on cumin1001 and starting a test of the DC switchover automation, expect some SAL noise but no production impact
* 14:08 duesen: Deployed patch for [[phab:T260485|T260485]]
* 13:59 marostegui: Stop mysql on db1117:3325 to clone db1128 - [[phab:T260324|T260324]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for MCR change', diff saved to https://phabricator.wikimedia.org/P12327 and previous config saved to /var/cache/conftool/dbconfig/20200824-135538-marostegui.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after MCR change', diff saved to https://phabricator.wikimedia.org/P12326 and previous config saved to /var/cache/conftool/dbconfig/20200824-133032-marostegui.json
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12325 and previous config saved to /var/cache/conftool/dbconfig/20200824-131305-marostegui.json
* 13:05 moritzm: installing imagemagick security updates on stretch
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12323 and previous config saved to /var/cache/conftool/dbconfig/20200824-130024-marostegui.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12322 and previous config saved to /var/cache/conftool/dbconfig/20200824-125131-marostegui.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for MCR change', diff saved to https://phabricator.wikimedia.org/P12321 and previous config saved to /var/cache/conftool/dbconfig/20200824-122848-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 after MCR change', diff saved to https://phabricator.wikimedia.org/P12320 and previous config saved to /var/cache/conftool/dbconfig/20200824-122752-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12319 and previous config saved to /var/cache/conftool/dbconfig/20200824-122050-marostegui.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12318 and previous config saved to /var/cache/conftool/dbconfig/20200824-121200-marostegui.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12317 and previous config saved to /var/cache/conftool/dbconfig/20200824-120310-marostegui.json
* 12:01 Urbanecm: EU B&C window completed
* 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8c380d65d760591099c296ae522b2e63953413aa}}: Enable tewiki as import source for tewikibooks ([[phab:T260107|T260107]]) (duration: 00m 57s)
* 11:58 XioNoX: test advertise CF tunnel endpoint on cr1-eqiad - [[phab:T259036|T259036]]
* 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5a6d025b04eb20787e8abbbdd56a3abb3818b82f}}: Add retrobibliothek.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T261012|T261012]]) (duration: 00m 56s)
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e1ae39afbb4d6f33e74782580db7dfee06d0097d}}: Enable mapframe at trwiki ([[phab:T260594|T260594]]) (duration: 00m 58s)
* 11:43 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: {{Gerrit|1066ecbe2836e69211c905f597ad6b62241528c0}}: Enable MediaSearch A/B test ([[phab:T254388|T254388]]) (duration: 00m 56s)
* 11:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/ContentTranslation/modules/publish/ext.cx.wikibase.link.js: {{Gerrit|74a87184408937bcdb4a27f1f563bbbdff45cf97}}: Publish: Fix broken wikidata linking ([[phab:T249458|T249458]]) (duration: 00m 58s)
* 11:39 Urbanecm: Purge 13 URLs with purgeList.php, see P12316 for list of them ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]])
* 11:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:32 arturo: add liblept5 1.76.0-1~bpo9+1 (and leptonica-progs) to stretch-wikimedia/component/tesseract-410-bpo ([[phab:T247422|T247422]])
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe0449d244ee876e4fb64da630f0994ab114f248}}: {{Gerrit|74220d0943e6b32cce3c93dd5b9f8bbc63fa5d73}}: {{Gerrit|7db8a19c512cea84f3000463e9dfb6617857c9a6}}: Update Chinese wordmarks and taglines, update zhwikisource project logo ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]]) (duration: 00m 59s)
* 11:29 urbanecm@deploy1001: Synchronized static/images/: {{Gerrit|fe0449d244ee876e4fb64da630f0994ab114f248}}: {{Gerrit|74220d0943e6b32cce3c93dd5b9f8bbc63fa5d73}}: {{Gerrit|7db8a19c512cea84f3000463e9dfb6617857c9a6}}: Update Chinese wordmarks and taglines, update zhwikisource project logo ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]]) (duration: 00m 58s)
* 11:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 10:43 moritzm: installing ruby2.3 security updates
* 10:12 moritzm: installing firejail security updates on mw canaries
* 09:58 oblivian@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=appserver,service=canary
* 09:46 XioNoX: add PNI to CF on cr1-eqiad with import/export NONE - [[phab:T259036|T259036]]
* 09:18 moritzm: restarting mw canaries to pick up libx11 update
* 09:13 moritzm: installing libx11 security updates on stretch
* 09:10 vgutierrez: repool cp5002
* 09:08 _joe_: restarting php-fpm on mw1344 (stuck in SIGILL for new children)
* 09:00 vgutierrez: restart ats-tls on cp5002
* 08:54 moritzm: installing net-snmp security updates on buster
* 08:52 ema: depool cp5002 due to icinga errors
* 08:24 moritzm: installing json-c security updates on buster
* 07:36 XioNoX: push new pfw policies - [[phab:T261007|T261007]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1105:3311 for MCR change', diff saved to https://phabricator.wikimedia.org/P12315 and previous config saved to /var/cache/conftool/dbconfig/20200824-052916-marostegui.json
Β 
== 2020-08-23 ==
* 20:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 gehel: repool wdqs1006 - catched up on lag
Β 
== 2020-08-22 ==
* 19:33 ryankemper: depooled wdqs1006 (still has 2.5 hours to catch up on)
* 19:31 ryankemper: pooled wdqs1006 now that lag has dissipated
* 07:36 gehel: restart blazegraph on wdqs1006 + depool to catchup on lag
* 05:24 legoktm: legoktm@mwmaint1002:~$ echo "https://releases.wikimedia.org/mediawiki/1.35/" {{!}} mwscript purgeList.php --wiki=aawiki
Β 
== 2020-08-21 ==
* 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:17 zpapierski@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s)
* 16:16 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification
* 16:15 zpapierski@deploy1001: deploy aborted: .. (duration: 00m 01s)
* 16:15 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: ..
* 13:25 jayme@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw
* 13:25 jayme@cumin1001: conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 09:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 01:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
Β 
== 2020-08-20 ==
* 22:31 eileen: civicrm revision changed from {{Gerrit|27d5900f7d}} to {{Gerrit|ce28723709}}, config revision is {{Gerrit|706cf3c898}}
* 22:20 eileen: civicrm revision is {{Gerrit|27d5900f7d}}, config revision is {{Gerrit|706cf3c898}}
* 22:20 mutante: permanently shut down tungsten.eqiad.wmnet [[phab:T260395|T260395]] [[phab:T158837|T158837]] [[phab:T180761|T180761]] [[phab:T224549|T224549]]
* 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:35 ejegg: updated fundraising CiviCRM from {{Gerrit|958a79f660}} to {{Gerrit|27d5900f7d}}
* 20:53 cdanis: repool eqsin
* 20:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:25 cdanis: cdanis@cr2-eqsin> request vmhost reboot
* 20:17 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:13 cdanis: cdanis@cr2-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-18.2R3-S5.3.tgz
* 20:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:02 cdanis: depool eqsin for router upgrade
* 19:57 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
* 19:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:24 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:17 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 19:17 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 19:08 mutante: restarted apache on cont2001 for integration.wikimedia.org docroot change
* 19:07 mutante: switching document root of integration.wikimedia.org to scap ([[phab:T149924|T149924]])
* 19:02 twentyafterfour: 1.36.0-wmf.5 has no known blockers and logspam is cleaned up, time to roll group2 wikis to wmf.5
* 18:42 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 18:42 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:19 mutante: ores1004 - starting failed celery-ores-worker
* 18:18 mutante: testreduce1001 - rt_client and vd_client now properly stopped by puppet [[phab:T257906|T257906]]
* 17:29 shdubsh: restart elasticsearch on logstash1012 (not 1020) -- high gc runtimes
* 17:28 shdubsh: restart elasticsearch on logstash1020 -- high gc runtimes
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 17:23 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 17:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 17:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 16:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:48 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:43 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:40 _joe_: restarted apache2 on icinga1001
* 16:13 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:11 shdubsh: restart elasticsearch on logstash1011 -- long gc runs
* 16:10 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:08 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:06 oblivian@deploy1001: Finished deploy [ores/deploy@8540eec]: various configuration fixes (duration: 09m 03s)
* 13:57 oblivian@deploy1001: Started deploy [ores/deploy@8540eec]: various configuration fixes
* 13:53 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:53 oblivian@deploy1001: Finished deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 14m 00s)
* 13:39 oblivian@deploy1001: Started deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy [[phab:T244843|T244843]]
* 13:26 oblivian@deploy1001: Finished deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (take 2) (duration: 11m 37s)
* 13:14 oblivian@deploy1001: Started deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (take 2)
* 13:11 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 11m 19s)
* 13:09 gehel: repool wdqs1007 - catched up on lag
* 13:00 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]]
* 12:51 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 07m 03s)
* 12:44 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]]
* 11:49 Lucas_WMDE: EU backport window done
* 11:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/AbuseFilter/includes/AbuseFilterHooks.php: {{Gerrit|d762e7b5526d91fe21e5980bc5e9f3be06a2f85c}}: Use $user param when filtering edits ([[phab:T258717|T258717]]) (duration: 01m 05s)
* 11:41 eileen: civicrm revision changed from {{Gerrit|6c9441a18e}} to {{Gerrit|958a79f660}}, config revision is {{Gerrit|706cf3c898}}
* 11:38 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/AbuseFilter/includes/AbuseFilterHooks.php: {{Gerrit|00da39b6913ac2eab600bbb61258472b60d2cbcb}}: Use $user param when filtering edits ([[phab:T258717|T258717]]) (duration: 01m 05s)
* 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/Wikibase/client/data-bridge/dist/: Backport: [[gerrit:621488{{!}}Don't try to load source maps in production (T260852)]] (duration: 01m 07s)
* 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix testwikidata depicts id & CirrusSearchUserTesting config (duration: 01m 06s)
* 11:07 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=trwiki editor # [[phab:T260899|T260899]]
* 10:58 XioNoX: re-pool codfw - [[phab:T259621|T259621]]
* 10:53 XioNoX: un-drain cr1-codfw - [[phab:T259621|T259621]]
* 10:45 XioNoX: cr1-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 10:26 hashar: Restarted zuul-merger instances on contint1001 and contint2001
* 10:24 hashar@deploy1001: Finished deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events (duration: 00m 24s)
* 10:24 hashar@deploy1001: Started deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events
* 10:21 XioNoX: cr1-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 10:12 XioNoX: reboot cr1-codfw:re1 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:57 XioNoX: bump cr1-codfw OSPF metrics - [[phab:T259621|T259621]]
* 09:51 XioNoX: enable transit/peering and re-set normal OSPF values on cr2-codfw - [[phab:T259621|T259621]]
* 09:41 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:36 eileen: civicrm revision changed from {{Gerrit|cf9fadbeed}} to {{Gerrit|6c9441a18e}}, config revision is {{Gerrit|706cf3c898}}
* 09:33 XioNoX: reboot cr2-codfw:re0 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:18 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:18 kormat: stress-testing db2125 [[phab:T260670|T260670]]
* 09:08 XioNoX: reboot cr2-codfw:re1 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:03 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2125 after host failure [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12303 and previous config saved to /var/cache/conftool/dbconfig/20200820-090313-kormat.json
* 08:52 kormat: removing /usr/bin/check_mariadb.py from all db hosts [[phab:T259516|T259516]]
* 08:52 XioNoX: disable transit/peering on cr2-codfw - [[phab:T259621|T259621]]
* 08:48 XioNoX: bump cr2-codfw OSPF metrics - [[phab:T259621|T259621]]
* 08:44 jynus: running analyze table on db1115's tendril.global_status_log, may case some stalls on tendril/dbtree [[phab:T260876|T260876]]
* 08:41 XioNoX: depool codfw for routers upgrade - [[phab:T259621|T259621]]
* 08:31 XioNoX: enable transit/peering on cr3-knams - [[phab:T259621|T259621]]
* 08:21 XioNoX: reboot cr3-knams for upgrade - [[phab:T259621|T259621]]
* 08:07 XioNoX: disable transit/peering on cr3-knams - [[phab:T259621|T259621]]
* 07:39 hashar: contint2001: restarted zuul
* 07:29 hashar: contint1001: restarted zuul-merger
* 07:29 hashar@deploy1001: Finished deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - [[phab:T258630|T258630]] (duration: 00m 13s)
* 07:28 hashar@deploy1001: Started deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - [[phab:T258630|T258630]]
* 01:54 ejegg: re-enabled fundraising scheduled jobs
* 00:51 mutante: ms-be1039 - started failed ferm service
* 00:35 ejegg: stopped fundraising scheduled jobs
* 00:27 eileen: civicrm revision changed from {{Gerrit|c442a09153}} to {{Gerrit|cf9fadbeed}}, config revision is {{Gerrit|3cdffd4fc2}}
Β 
== 2020-08-19 ==
* 23:20 Urbanecm: Evening B&C window closed
* 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a80899948c26ca36b970b80fbad07600fe4ce92c}}: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki ([[phab:T260825|T260825]]) (duration: 01m 05s)
* 22:41 eileen: civicrm revision changed from {{Gerrit|34f95a3311}} to {{Gerrit|c442a09153}}, config revision is {{Gerrit|3cdffd4fc2}}
* 21:27 eileen: civicrm revision changed from {{Gerrit|154519cc1f}} to {{Gerrit|34f95a3311}}, config revision is {{Gerrit|3cdffd4fc2}}
* 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
* 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]]) (duration: 00m 06s)
* 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]])
* 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
* 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. ([[phab:T257906|T257906]])
* 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]] (duration: 01m 04s)
* 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60af096b80a8ef7bc94ec40ce203fd27b0c97f26}}: Add autopatrolled group at arzwiki ([[phab:T260761|T260761]]) (duration: 01m 04s)
* 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
* 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|924a03bd624d6750a7e776e09713056cc45e5cc5}}: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T259927|T259927]]) (duration: 01m 04s)
* 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|83b34e1bd1ed804a70f67e089580e082f89e2a0f}}: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication ([[phab:T258695|T258695]]) (duration: 01m 04s)
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|95d45f6e002df78d4860a711042d77a6b0bdecb9}}: Dont index Draft (118) and Draft talk (119) on hywiki ([[phab:T260804|T260804]]) (duration: 01m 04s)
* 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 04s)
* 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 06s)
* 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb4aa44b0bd5b2b33d190d3af81e038e5fc55e3f}}: Configure namespaces on commons to include categories ([[phab:T198716|T198716]]) (duration: 01m 04s)
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 2/2) (duration: 01m 04s)
* 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 1/2) (duration: 01m 06s)
* 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6f8354e7599a5e92bea060807065f5b42c540e5}}: Enable $wgMFNoindexPages for all wikis ([[phab:T255458|T255458]]) (duration: 01m 07s)
* 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:38 mutante: decom'ing releases2001.codfw.wmnet (
* 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
* 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
* 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
* 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
* 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
* 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:41 rzl: disable puppet on cumin1001 for switchdc testing
* 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:34 gehel: depooling wdqs1007 and restarting blazegraph
* 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
* 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:03 _joe_: building and uploading fluent-bit, ratelimit images
* 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 12:57 _joe_: building a new version of the base docker images
* 11:29 awight: EU bacon finished
* 11:28 effie: restart mwdebug* servers
* 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:621227{{!}}Fix typos in flaggedrevs comments ()]] (duration: 01m 19s)
* 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:36 XioNoX: update firewall policies on pfw - [[phab:T260585|T260585]]
* 08:35 jayme: running puppet on A:all-mw-eqiad
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:20 godog: switch grafana.w.o to grafana 7 in codfw - [[phab:T259143|T259143]]
* 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:06 jayme: running puppet on A:all-mw-eqiad
* 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - [[phab:T259143|T259143]]
* 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 06:13 eileen: tools revision changed from {{Gerrit|b4ebd1e564}} to {{Gerrit|0b9d971bc4}}
* 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 02:53 cstone: civicrm revision changed from {{Gerrit|f5469d0a4c}} to {{Gerrit|154519cc1f}}
* 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
* 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend ([[phab:T180761|T180761]]) (duration: 05m 13s)
* 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
Β 
== 2020-08-18 ==
* 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D ([[phab:T258021|T258021]]) (duration: 01m 05s)
* 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D ([[phab:T258021|T258021]]) (duration: 01m 06s)
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
* 23:34 Urbanecm: Run scap pull at mw1301
* 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
* 23:32 mutante: rebooting mw1301 via mgmt
* 23:22 mutante: killed reboot-cluster on cumin1001
* 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac34f7274823e40d0c79752eb5ffe74c76856d04}}: Enable subpages in NS:0 in techconductwiki ([[phab:T260350|T260350]]) (duration: 05m 14s)
* 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
* 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
* 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
* 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]] (duration: 53m 12s)
* 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
* 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --deleteΒ  [[phab:T256164|T256164]]
* 19:47 ejegg: updated payments-wiki from {{Gerrit|a7ee1790e0}} to {{Gerrit|ef7ebd08cb}}
* 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/Β  # [[phab:T256164|T256164]]
* 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/<nowiki>{</nowiki>REL1_27,REL1_29,REL1_30<nowiki>}</nowiki>Β  # [[phab:T256164|T256164]]
* 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs [[phab:T257973|T257973]]
* 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09{{!}}11{{!}}13).*
* 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)