You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
 
(550 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-02-22 ==
== 2021-10-27 ==
* 00:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:40 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Allow upload by URL for Wikisources ([[phab:T293205|T293205]]), and enable it on enwikisource for autoconfirmed ([[phab:T294447|T294447]]) (duration: 01m 03s)
* 00:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:28 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734361{{!}}Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189)]] (duration: 01m 02s)
* 00:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:27 catrope@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-mni.svg: Config: [[gerrit:734361{{!}}Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189)]] (duration: 01m 03s)
* 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:06 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: [[phab:T294489|T294489]] (duration: 01m 15s)
* 21:42 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist wikipedia namespaceDupes.php --fix {{!}} tee namespacedupes-wikipedia-real.log # run namespaceDupes.php for all Wikipedias
* 21:38 urbanecm: run namespaceDupes.php for a bunch of Wikipedias
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T294489|T294489]] (duration: 01m 59s)
* 20:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet
* 20:47 mutante: mw2255 - scap pull, repooling - after DRAC firmware was upgraded - [[phab:T283582|T283582]]
* 20:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 19:53 bblack: cp5xxx: switching unified cert to digicert-2021
* 19:49 bblack: cp5007: switching unified cert to digicert-2021
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]] (duration: 01m 47s)
* 19:34 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]]
* 19:28 bblack: cp5001: switching unified cert to digicert-2021
* 19:24 bblack: cp5xxx: disabling puppet ahead of digicert unified certificate update rollout
* 18:46 legoktm: installing python-swiftclient on mw1305 for debugging
* 18:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:22 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:15 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734697{{!}}Disable Education Program namespaces in eswiki (T294365)]] (duration: 01m 04s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734451{{!}}Temporarily change the votewiki lang to fa (T292685)]] (duration: 01m 04s)
* 17:40 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18] (duration: 06m 30s)
* 17:34 otto@deploy1002: Started deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18]
* 17:29 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18] (duration: 00m 07s)
* 17:29 otto@deploy1002: Started deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18]
* 16:42 otto@deploy1002: Finished deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18] (duration: 20m 30s)
* 16:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:21 otto@deploy1002: Started deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18]
* 15:36 ejegg: updated payments-wiki from {{Gerrit|6e810fb401}} to {{Gerrit|5b9fdd0fe1}}
* 15:28 volans: deployed new prefixes for drmrs in modules/network/data/data.yaml - [[phab:T282787|T282787]]
* 15:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:03 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:56 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:51 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:00 marostegui: Replace m5-master so it points to dbproxy1017 - [[phab:T288093|T288093]]
* 13:58 elukey: removed /var/run/confd-template/.inference*.err files from puppetmaster2001 (backup saved in /home/elukey just in case)
* 10:53 jbond: enable puppet fleet wide post gerrit:734937
* 10:43 jbond: disable puppet fleet wide to deploy a puppetmaster change gerrit:734937
* 10:43 jbond: disable puppet fleet wide to deploy a puppetmaster change
* 10:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:12 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/: {{Gerrit|305e97a}}, {{Gerrit|b9eaa20}}: GrowthExperiments backports ([[phab:T293434|T293434]], [[phab:T294386|T294386]]) (duration: 01m 04s)
* 10:10 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/: {{Gerrit|305e97a}}, {{Gerrit|667a4be}}: GrowthExperiments backports ([[phab:T293434|T293434]], [[phab:T294386|T294386]]) (duration: 01m 04s)
* 10:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:02 urbanecm: [urbanecm@mwdebug1001 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752
* 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:01 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752
* 09:25 godog: another run of backfill on graphite1004 - [[phab:T294355|T294355]]
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist replicas from s6 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17615 and previous config saved to /var/cache/conftool/dbconfig/20211027-092043-marostegui.json
* 09:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:04 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:50 topranks: Enabling Telxius circuit from cr1-eqiad to asw1-b12-drmrs with homer.
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Contributions replicas from s6.codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17614 and previous config saved to /var/cache/conftool/dbconfig/20211027-074935-marostegui.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist replicas from s6.codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17613 and previous config saved to /var/cache/conftool/dbconfig/20211027-072546-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges and recentchangeslinked replicas from s6.codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17612 and previous config saved to /var/cache/conftool/dbconfig/20211027-060634-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager replicas from s6.codfw [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17611 and previous config saved to /var/cache/conftool/dbconfig/20211027-053104-marostegui.json


== 2020-02-21 ==
== 2021-10-26 ==
* 23:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:59 legoktm: uploaded python-logstash to buster-wikimedia for [[phab:T294393|T294393]]
* 23:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 23:05 andrewbogott: updated (?) wikitech-static to 1.34.0
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:01 sbassett@deploy1001: Finished scap: Deploy security fix for [[phab:T232932|T232932]] (duration: 05m 35s)
* 21:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 21:56 sbassett@deploy1001: Started scap: Deploy security fix for [[phab:T232932|T232932]]
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:53 andrew@deploy1001: Finished deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two (duration: 03m 41s)
* 21:04 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: [[phab:T293783|T293783]] (duration: 01m 02s)
* 21:49 andrew@deploy1001: Started deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:45 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel (duration: 00m 11s)
* 21:03 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: [[phab:T293783|T293783]] (duration: 01m 02s)
* 21:45 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel
* 21:01 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/includes/api/ApiQueryImageInfo.php: [[phab:T293783|T293783]] (duration: 01m 03s)
* 21:23 mutante: LDAP - added ldickinson to wmf
* 21:00 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/includes/api/ApiQueryImageInfo.php: [[phab:T293783|T293783]] (duration: 01m 03s)
* 21:23 mutante: LDAP - added dduvall to archiva-deployers
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:51 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]]
* 20:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:48 eileen: civicrm revision changed from {{Gerrit|733a8fceda}} to {{Gerrit|dba74c443b}}, config revision is {{Gerrit|eed79486d5}}
* 20:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:38 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]] (duration: 25m 28s)
* 20:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 18:34 XioNoX: re-enable GRE tunnels on cr3-esams - [[phab:T245825|T245825]]
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]]
* 15:55 XioNoX: add gobgpd to buster-wikimedia repo
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 15:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:52 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 01m 34s)
* 15:06 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:50 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 13:38 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245778|T245778]] [[phab:T245182|T245182]] [[phab:T232140|T232140]] (duration: 01m 00s)
* 17:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 37s)
* 12:29 mark: cr3-esams: Shutdown GRE tunnels over Telia
* 17:06 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 12:27 akosiaris: repool mathoid at eqiad, test complete
* 17:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 1100m 51s)
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 16:25 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 12:20 moritzm: rebooting boron
* 16:25 cdanis@cumin1001: START - Cookbook sre.network.cf
* 12:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_wikidata_resubmit_changes_for_dispatch
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:23 mutante: mwmaint1002 - running puppet, created new mw periodic job from gerrit:732972 ([[phab:T294031|T294031]])
* 12:17 moritzm: bumped memory for boron.eqiad.wmnet to 16G
* 16:07 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:04 mark: cr3-esams: request chassis fpc offline slot 1
* 16:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:57 mark: Disabled Telia transit on cr3-esams
* 15:45 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:57 mark: Set VRRP prio cost to 50 on cr3-esams to make it backup VRRP
* 15:41 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:48 elukey: restart varnishkafka-webrequest on cp3052 (stuck in timeouts to kafka, analytics alarms raised)
* 15:38 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:47 elukey: restart varnishkafka-webrequest on cp3056/cp3058/cp3054/cp3064 (stuck in timeouts to kafka, analytics alarms raised)
* 15:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:39 elukey: restart varnishkafka on cp3057 (stuck in timeouts to kafka, analytics alarms raised)
* 15:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 11:21 godog: bounce logstash on logstash1023 - see if can catch up with elastic7 kafka lag
* 15:07 topranks: Running homer against cr3-esams to create new temp GRE tunnel to asw1-b12-drmrs
* 11:14 elukey: reboot stat1005 - GPU blocked at 100% after issue with tensorflow
* 15:02 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:18 akosiaris: depool mathoid in eqiad for a test
* 15:02 cdanis@cumin1001: START - Cookbook sre.network.cf
* 09:18 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 14:55 topranks: Adding static route on cr3-esams to asw1-b12-drmrs Telia link IP to allow GRE to be built.
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10473 and previous config saved to /var/cache/conftool/dbconfig/20200221-085405-marostegui.json
* 13:50 elukey: ran "Capirca Host Definition" script on netbox - output https://netbox.wikimedia.org/extras/scripts/results/1787315/
* 08:34 fdans@deploy1001: Finished deploy [analytics/refinery@4d56021]: deploying refinery (duration: 14m 55s)
* 13:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase: {{Gerrit|7723cf724df9ede49129443e43336e93efcd7a41}}: RecentChangeFactory: Add missing rc_logid value ([[phab:T293885|T293885]]) (duration: 01m 02s)
* 08:19 fdans@deploy1001: Started deploy [analytics/refinery@4d56021]: deploying refinery
* 13:40 elukey: ran "Capirca Host Definition" script on netbox-next to get up-to-date aqs_group host definition - result https://netbox-next.wikimedia.org/extras/scripts/results/894348/
* 08:02 akosiaris: disable mod_remoteip on otrs host, following merge of https://gerrit.wikimedia.org/r/573877
* 13:24 kart_: Updated cxserver to 2021-10-25-123807-production ([[phab:T217747|T217747]], [[phab:T218217|T218217]], [[phab:T292421|T292421]])
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 - [[phab:T245797|T245797]]
* 13:19 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 -
* 13:13 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:34 marostegui: Stop mysql on es1024 to clone es1025 - [[phab:T243052|T243052]]
* 13:05 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 05:57 marostegui: Start MySQL on labsdb1011 without replication - [[phab:T245797|T245797]]
* 13:05 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.4 (duration: 31m 07s)
* 05:44 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy18 - [[phab:T245797|T245797]]
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:53 bstorm_: depooled labsdb1011 and set weight 10 on labsdb1009 vs 3 on labsdb1010 [[phab:T245797|T245797]]
* 12:35 hashar: scap clean --delete 1.38.0-wmf.4 # [[phab:T293947|T293947]]
* 02:43 ejegg: updated Fundraising CiviCRM from {{Gerrit|a6b222c19f}} to {{Gerrit|c086fd4e0b}}
* 12:32 hashar: Applied security patches to 1.38.0-wmf.6 # [[phab:T293947|T293947]]
* 02:27 bstorm_: stopped mariadb on labsdb1011 because it keeps crashing anyway
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync Beta-Cluster-only change to CommonSettings now we're sure we won't revert (duration: 00m 56s)
* 12:31 hashar: scap prep 1.38.0-wmf.6 # [[phab:T293947|T293947]]
* 01:04 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages. (duration: 03m 32s)
* 12:16 jbond: upload cas_6.4.2-1+wmf10u3_amd64
* 01:01 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages.
* 12:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:51 urbanecm@deploy1002: Finished scap: {{Gerrit|c131f32e5e0804c8f5c2ec768b334c81a1b35151}}: Add namespace translations for [ami] Amis and [pwn] Paiwan ([[phab:T292414|T292414]], [[phab:T292415|T292415]]) (duration: 02m 25s)
* 11:49 urbanecm@deploy1002: Started scap: {{Gerrit|c131f32e5e0804c8f5c2ec768b334c81a1b35151}}: Add namespace translations for [ami] Amis and [pwn] Paiwan ([[phab:T292414|T292414]], [[phab:T292415|T292415]])
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|575a6a66b279c3d2d8974ffcc4911cc5b927be47}}: Fix HD logo size in some wikis ([[phab:T250731|T250731]]; 2/2) (duration: 00m 55s)
* 11:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|575a6a66b279c3d2d8974ffcc4911cc5b927be47}}: Fix HD logo size in some wikis ([[phab:T250731|T250731]]; 1/2) (duration: 00m 57s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:46 jbond: upload cas_6.4.2-1+wmf10u2_amd64.deb
* 10:40 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift
* 10:39 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift-ro
* 10:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:07 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Switching back graphite to eqiad (duration: 00m 55s)
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:06 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Switching back graphite to eqiad (duration: 01m 04s)
* 09:49 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:49 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:47 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:40 godog: flip back write traffic to graphite1004 (all but mediawiki) - [[phab:T247963|T247963]]
* 09:27 godog: move read traffic back to graphite1004 - [[phab:T247963|T247963]]
* 08:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:33 ema: upload varnish_6.0.8-1wm2 to component/varnish6 on apt.wm.org [[phab:T293879|T293879]]
* 08:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/maintenance: {{Gerrit|91316ed5714c4426a29fefded5c4db08dbba48bb}}: Add purgeExpiredMentorStatus.php ([[phab:T280307|T280307]]) (duration: 00m 56s)
* 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 07:07 effie: pool mw1319 and mw1312
* 07:05 effie: pool  wtp1026.eqiad.wmnet
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17606 and previous config saved to /var/cache/conftool/dbconfig/20211026-063647-root.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17605 and previous config saved to /var/cache/conftool/dbconfig/20211026-062144-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17604 and previous config saved to /var/cache/conftool/dbconfig/20211026-060640-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17603 and previous config saved to /var/cache/conftool/dbconfig/20211026-055136-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17602 and previous config saved to /var/cache/conftool/dbconfig/20211026-053633-root.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17601 and previous config saved to /var/cache/conftool/dbconfig/20211026-052129-root.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:24 krinkle@deploy1002: Synchronized wmf-config/logging.php: {{Gerrit|I0211e1c77}} (duration: 00m 55s)
* 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-02-20 ==
== 2021-10-25 ==
* 23:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245787|T245787]] [nlwiki] Add noindex for NS_USER and NS_USER_TALK (duration: 00m 56s)
* 23:12 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Create alias for Appendix and Appendix_talk namespaces on mywiktionary ([[phab:T291146|T291146]]) (duration: 00m 55s)
* 23:46 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgVectorPrintLogo for back-compat., not read since wmf.19 (duration: 00m 56s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw232[0-4].codfw.wmnet
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 mutante: gerrit1002 - test VM - rebooting for new disk
* 22:57 ryankemper: [wcqs] Downtimed `wcqs*` until roughly a week from now (while we setup oauth)
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231[7-9].codfw.wmnet
* 22:53 legoktm: uploaded PHP 7.4.25 to apt.wm.o (DSA-4992-1)
* 23:33 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw232[0-4].codfw.wmnet
* 22:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231[7-9].codfw.wmnet
* 22:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 03m 04s)
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2381[7-9].codfw.wmnet
* 22:27 ryankemper@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 23:25 mutante: ganeti1003 - adding another virtual 20G disk to gerrit1002 ([[phab:T243808|T243808]])
* 21:53 mutante: new project language "pwn" added - Paiwan is a native language of Taiwan, spoken by the Paiwan, a Taiwanese indigenous people. [[phab:T292415|T292415]]
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 mutante: new project language "ami" added - Sowal no 'Amis is the Formosan language of the 'Amis (or Ami), an indigenous people living along the east coast of Taiwan. - [[phab:T292414|T292414]]
* 23:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:50 mutante: log authdns1001 (DNS) - sudo authdns-update, add new project language "ami" (Amis) for [[phab:T292414|T292414]] - edited langlist.tmpl which regenerates all project zones
* 23:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/pager/IndexPager.php: IndexPager: Limit offset params to the max of the indices available (duration: 00m 56s)
* 21:40 mutante: authdns1001 (DNS) - sudo authdns-update, add new project language "pwn" (Paiwan) for [[phab:T292415|T292415]]
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 22:28 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad
* 19:47 mutante: mw2255 - depooled=inactive (incl "dsh groups"), shut down physically for [[phab:T283582|T283582]] - can be worked on anytime
* 22:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2255.codfw.wmnet
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 22:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1 (duration: 05m 05s)
* 19:42 mutante: icinga - ACKing all unhandled CRIT alerts on hosts with "dev" or "test" in their name, regardless of notifications being disabled or not. just so that we get more signal than noise in actual unhandled CRITs in web UI
* 22:23 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1
* 19:40 mutante: cumin2002 - sudo systemctl reset-failed to clear Icinga alert about failed but (now) non-existing service database-backups-snapshots.service, assuming it's a case of "only in active DC"
* 21:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245780|T245780]] [mediawikiwiki] Deny the 'flow-hide' right to logged out and non-autoconfirmed users (duration: 00m 56s)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 20:07 James_F: Train 1.35.0-wmf.20 provisionally looks OK on all wikis. Closing [[phab:T233868|T233868]].
* 19:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.20
* 19:07 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily move mw groups to db1123 [[phab:T294295|T294295]]', diff saved to https://phabricator.wikimedia.org/P17597 and previous config saved to /var/cache/conftool/dbconfig/20211025-190717-kormat.json
* 19:55 twentyafterfour: hotfix deployed
* 19:06 mutante: db1112 - powercycling
* 19:51 twentyafterfour: deploying phabricator hotfix: https://phabricator.wikimedia.org/rPHEX2f36eee7ce67eb0c09e9bb0e79b42fc3b41d3597 for [[phab:T244165|T244165]]
* 19:04 legoktm@cumin1001: dbctl commit (dc=all): 'Depool db1112 ([[phab:T294295|T294295]])', diff saved to https://phabricator.wikimedia.org/P17596 and previous config saved to /var/cache/conftool/dbconfig/20211025-190436-legoktm.json
* 19:33 bblack: codfw+ulsfo repooled in geodns
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 fdans@deploy1001: Finished deploy [analytics/refinery@e05ae16]: deploying refinery (duration: 11m 31s)
* 18:40 jforrester@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/timeline/includes/Timeline.php: Backport: [[gerrit:734312{{!}}Input may be null when rendering a self-closing tag `<timeline />` (T294020)]] (duration: 00m 55s)
* 18:08 fdans@deploy1001: Started deploy [analytics/refinery@e05ae16]: deploying refinery
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:38 bblack: pushed codfw+ulsfo geodns depool
* 18:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:45 jynus: stop, upgrade and restart dbprov2002
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:26 jynus: stop, upgrade and restart dbprov1002
* 18:24 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 55s)
* 16:23 moritzm: installing Java security updates on Hadoop/Kafka Jumbo/AQS/Druid
* 18:22 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 54s)
* 16:16 jynus: stop, upgrade and restart db1140
* 18:19 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732840{{!}}Fix array declaration of NS_USER_TALK abbreviation on ruwikiquote (T197058)]] (duration: 00m 55s)
* 16:12 moritzm: installing postgres security updates on netboxdb*
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:03 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm (duration: 06m 15s)
* 18:15 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:732836{{!}}flaggedrevs: Drop legacy wgFlaggedRevsStatsAge config, no longer read]] (duration: 00m 55s)
* 15:57 fdans@deploy1001: Started deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:40 marostegui: Poweroff es2022 [[phab:T245714|T245714]]
* 18:12 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732254{{!}}Make reply tool available as opt-out on frwiki (T293687)]] (duration: 00m 56s)
* 15:32 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@95a7999]: deploying aqs (duration: 00m 48s)
* 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 15:32 fdans@deploy1001: Started deploy [analytics/aqs/deploy@95a7999]: deploying aqs
* 17:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 15:23 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@cbc3241]: deploying aqs (duration: 04m 06s)
* 17:39 mutante: mw2253 - scap pull after hw maintenance is over
* 15:19 fdans@deploy1001: Started deploy [analytics/aqs/deploy@cbc3241]: deploying aqs
* 17:32 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:38 Urbanecm: [dry-run; mwmaint1002] foreachwiki extensions/AbuseFilter/maintenance/fixOldLogEntries.php --dry-run --verbose ([[phab:T228655|T228655]])
* 17:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 12:53 moritzm: installing PHP updates on matomo1001/piwik
* 17:24 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:28 moritzm: installing PHP 7.0 security updates
* 17:23 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 12:11 Urbanecm: EU SWAT done
* 17:22 XioNoX: update core routers ACLs
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|728d739}}: Configure logo for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 17:20 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 12:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64240e1}}: Add logos for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 16:49 XioNoX: update management routers ACLs
* 11:19 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 16:36 XioNoX: DNS - Add eqsin-ulsfo transport v6 prefix - [[phab:T273308|T273308]]
* 11:08 moritzm: installing boost update from Buster point release
* 16:31 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10468 and previous config saved to /var/cache/conftool/dbconfig/20200220-105117-marostegui.json
* 16:28 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 10:12 Reedy: created $wikidb.blobs_cluster27 on es1023 - [[phab:T245720|T245720]]
* 16:25 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:08 Reedy: created $wikidb.blobs_cluster26 on es1020 - [[phab:T245720|T245720]]
* 16:25 mmandere@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:08 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 04s)
* 16:21 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 09:42 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 03s)
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 01s)
* 16:10 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2253.codfw.wmnet
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10467 and previous config saved to /var/cache/conftool/dbconfig/20200220-091233-marostegui.json
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:02 akosiaris: restart etherpad-lite on etherpad1002 [[phab:T244238|T244238]]
* 16:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:734298{{!}}Empty wikibase disabled access entity types on Beta (T294159)]] (beta-only) (duration: 01m 47s)
* 09:00 marostegui: Restart m1 database master db1135 (etherpad will not be available for around 1 minute) - [[phab:T244238|T244238]]
* 16:04 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 jynus: disable puppet and stop bacula service [[phab:T244238|T244238]]
* 16:01 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 08:35 marostegui: Upgrade mysql on db1135 without restart [[phab:T244238|T244238]]
* 15:57 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 52s)
* 07:47 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) - in case of cache issues (duration: 01m 03s)
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 15:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 01s)
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 54s)
* 07:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 15:46 jbond: upgrade cas/idp to 6.4.2
* 07:17 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 03s)
* 14:56 mutante: mw2253 - shut down and downtimed for 2 days
* 07:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 14:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 07:01 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) - extra sync for cache issue (duration: 01m 04s)
* 14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 07:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) (duration: 01m 06s)
* 14:49 mutante: depooling mw2253 for DRAC upgrade ([[phab:T283582|T283582]])
* 06:46 vgutierrez: test trafficserver 8.0.6-rc1 in cp30[64,65]
* 14:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10466 and previous config saved to /var/cache/conftool/dbconfig/20200220-062445-marostegui.json
* 14:45 jbond: update cas package
* 06:17 marostegui: Repool labsdb1011
* 14:31 marostegui: Deploy schema change on s3 codfw - [[phab:T291719|T291719]]
* 06:12 marostegui: Remove partitions from db1101:3318 - [[phab:T239453|T239453]]
* 12:04 ema: cp3062: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10465 and previous config saved to /var/cache/conftool/dbconfig/20200220-061213-marostegui.json
* 11:57 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 this host already had the partitions removed - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10464 and previous config saved to /var/cache/conftool/dbconfig/20200220-061019-marostegui.json
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10463 and previous config saved to /var/cache/conftool/dbconfig/20200220-060914-marostegui.json
* 11:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 on s8, db1099:3318 back to its original weight', diff saved to https://phabricator.wikimedia.org/P10462 and previous config saved to /var/cache/conftool/dbconfig/20200220-055943-marostegui.json
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:22 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571860{{!}}Allow non-autoconfirmed users to propose OAuth apps (T213760)]] (duration: 01m 04s)
* 11:24 Lucas_WMDE: UTC morning backport+config window done
* 00:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573397{{!}}Enable password-reset (requireemail pref) on test WD and Commons (T245660)]] (duration: 01m 03s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732969{{!}}Remove dispatchLagToMaxLagFactor Wikibase setting (T292604)]] (duration: 00m 54s)
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732951{{!}}Remove wikibaseDispatchRedisLockManager config (T292604)]] (duration: 00m 54s)
* 11:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732950{{!}}Remove wmg variables for dispatchChanges.php Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732949{{!}}Remove dispatchChanges.php-related Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732372{{!}}Remove dispatchViaJobs-related Wikibase settings (T291828)]] (duration: 00m 56s)
* 09:52 godog: bounce uwsgi graphite web on graphite2003 - [[phab:T294220|T294220]]
* 09:52 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:48 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:733089{{!}}[BETA CLUSTER] Enable WikibaseLexeme Scribunto access (T294159)]] (merged on Friday, syncing now to avoid outdated files even if it’s just -labs.php) (duration: 00m 55s)
* 09:18 godog: bounce graphite-web on graphite2003 to test timeout bump - [[phab:T294220|T294220]]
* 08:08 XioNoX: merge DNS changes to add drmrs
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 05:47 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=wtp1026.*
* 05:43 _joe_: pooling wtp1042 [[phab:T294212|T294212]]
* 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1109.eqiad.wmnet with OS buster
* 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1109.eqiad.wmnet with OS buster
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 (s8) for reimage [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17590 and previous config saved to /var/cache/conftool/dbconfig/20211025-043028-marostegui.json


== 2020-02-19 ==
== 2021-10-23 ==
* 23:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[0-3].eqiad.wmnet
* 16:40 dcausse: restarting blazegraph on wdqs1004 and wdqs1006 (free allocators alert)
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw137[4-9].eqiad.wmnet
* 15:45 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]]), testing whether [[phab:T291137|T291137]] is still an issue
* 23:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 23:28 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: cirrus: Reduce CirrusSearch-MoreLike cache workers and queue back to normal (duration: 01m 03s)
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw138[0-3].eqiad.wmnet
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw137[4-9].eqiad.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1363.eqiad.wmnet
* 23:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: redirect more_like from codfw back to eqiad (duration: 01m 04s)
* 23:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:57 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update (duration: 00m 57s)
* 22:56 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update
* 22:54 robh: cp3050 & cp3051 returned to service via [[phab:T243167|T243167]]
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgServer to protocol-relative for Wikitech and Test Wikitech (duration: 01m 05s)
* 22:37 robh: taking cp3050 & cp3051 offline for firmware update via [[phab:T243167|T243167]]
* 22:23 mutante: phabricator - upgrading PHP packages
* 22:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231([0-6]).codfw.wmnet
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231([0-6]).codfw.wmnet
* 22:11 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:10 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:08 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2314.codfw.wmnet
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:54 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:48 bblack: all authdns servers - upgrade to gdnsd-3.2.2
* 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:29 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:55 eileen: civicrm revision changed from {{Gerrit|52c68911c6}} to {{Gerrit|a6b222c19f}}, config revision is {{Gerrit|561ae21f77}}
* 20:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:13 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 05s)
* 20:05 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.20 (duration: 01m 03s)
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.20
* 20:02 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:02 rzl@cumin1001: conftool action : set/weight=10; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 19:54 rlazarus: scap pull on new api servers mw13[56-62]
* 19:50 mutante: generating mcrouter certs for new codfw mw appservers
* 19:39 mutante: initial puppet run on new hosts mw231*
* 19:31 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:21 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T244577|T244577]] [metawiki] Disable MobileFrontend mainpage special casing (duration: 01m 04s)
* 19:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244369|T244369]] [trwiki] Enable the WikidataPageBanner extension (duration: 01m 05s)
* 19:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: [[phab:T245570|T245570]] resourceloader: fix SqlDependencyModuleStore::setMulti() to use upsert() (duration: 01m 01s)
* 18:45 bblack: dns4001 - upgraded to gdnsd-3.2.2
* 18:44 bblack: reprepro: upload gdnsd 3.2.2-1~wmf1 to buster-wikimedia
* 18:39 mutante: mwmaint1002 - sudo systemctl reset-failed to clear systemd alerts
* 18:38 mutante: mwmaint1002 - removing Icinga ACK for systemd state - comments for it were from HHVM removal in Oct 2019
* 18:26 mutante: phab2001 - upgraded ssh-server, kept locally modified config; apt autoremove removes python3-debconf
* 18:23 mutante: phab2001 - installing package upgrades, incl. openssh, PHP version
* 18:22 mutante: phab2001 - upgrading mariadb client package versions
* 18:19 mutante: removing problem ACK from Icinga alerts for wikitech-static MediaWiki version. comments were about things in 2019
* 17:48 robh: cp1089 cp1090 returned to service via [[phab:T243167|T243167]]
* 17:40 jynus: starting data check between db1078 and db1140:3313 [[phab:T244958|T244958]]
* 17:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (just incase of cache issue) (duration: 01m 04s)
* 17:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (duration: 01m 01s)
* 17:14 ema: cp4026: repool after probe Connection:keep-alive experiment revert https://gerrit.wikimedia.org/r/573337
* 17:12 robh: cp1088 returned to service, cp1089 & cp1090 offline for firmware update via [[phab:T243167|T243167]]
* 16:44 papaul: replacing ps1-a8-codfw mgmt in rack A8 will go down
* 16:37 otto@deploy1001: Finished deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist (duration: 12m 27s)
* 16:32 ema: depool cp4026, 5xx
* 16:24 otto@deploy1001: Started deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist
* 16:13 marostegui: Depool labsdb1011 to help replication to catch up
* 16:05 elukey: Update analytics-in4 filter term eventgate for [[phab:T245203|T245203]] on cr1/cr2 eqiad
* 15:48 ariel@deploy1001: Finished deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests (duration: 00m 03s)
* 15:48 ariel@deploy1001: Started deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests
* 14:59 marostegui: Stop mysql on es2021 - [[phab:T243052|T243052]]
* 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 marostegui: Data checksum on db1084 [[phab:T245621|T245621]]
* 14:07 marostegui: Upgrade and reboot db1084 - [[phab:T245621|T245621]]
* 14:02 marostegui: Start mysql on db1084 without replication - [[phab:T245621|T245621]]
* 13:53 jbond42: disable puppet to upgrade postgresql
* 13:30 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1084, lots of connection errors', diff saved to https://phabricator.wikimedia.org/P10458 and previous config saved to /var/cache/conftool/dbconfig/20200219-133057-jynus.json
* 12:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]], take II, the cache issue (duration: 01m 04s)
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]] (duration: 01m 06s)
* 11:56 volans: better splay of periodic scripts that interact with Netbox - [[phab:T244291|T244291]]
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 04s)
* 11:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 12s)
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 jynus: upgrading mariadb client on cumin hosts
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315, db2089:3316 after new package testing', diff saved to https://phabricator.wikimedia.org/P10457 and previous config saved to /var/cache/conftool/dbconfig/20200219-103806-marostegui.json
* 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:17 jynus: stopping db2089 mariadb@s5
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw135[0-5]*.eqiad.wmnet
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw135[0-5]*.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1349.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1349.eqiad.wmnet
* 10:09 moritzm: updated tftpboot environment for stretch-bootif for the 9.12 point release [[phab:T241359|T241359]]
* 09:53 jynus: stopping and upgrading db1140 instances
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315, db2089:3316 for new package testing', diff saved to https://phabricator.wikimedia.org/P10455 and previous config saved to /var/cache/conftool/dbconfig/20200219-095139-marostegui.json
* 09:51 marostegui: Depool db2089:3315, db2089:3316 for new package testing
* 09:49 akosiaris: [[phab:T245516|T245516]]. Deploy mathoid chart version 0.0.27, removing logstash gelf configuration
* 09:46 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 09:43 vgutierrez: test trafficserver 8.0.6-rc1 in cp40[26,32]
* 09:34 _joe_: cleared opcache on mw1313
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:33 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:50 marostegui: Remove dbproxy1007 grants from m2 - [[phab:T231280|T231280]]
* 08:41 marostegui: Remove wikiadmin2 user from s7 - [[phab:T243512|T243512]]
* 08:23 Urbanecm: run mwscript deleteEqualMessages.php cswiki --delete
* 08:14 godog: roll restart swift proxies - [[phab:T244776|T244776]]
* 07:02 marostegui: Remove wikiadmin2 user from es2 - [[phab:T243512|T243512]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10454 and previous config saved to /var/cache/conftool/dbconfig/20200219-065726-marostegui.json
* 06:35 marostegui: Compress watchlist_expiry table on s3 (this will take hours as I have left a 60 seconds sleep between tables) - [[phab:T245358|T245358]]
* 06:17 marostegui: Compress new and empty watchlist_expiry table - [[phab:T245358|T245358]]
* 01:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 01:22 mutante: mw1353 - restarted apache (some race condition on new installs, 5 other servers did not have the issue)
* 01:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1355.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1354.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1350.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1353.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1351.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1352.eqiad.wmnet
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T240728|T240728]] Fix Latin Wikipedia (VICIPÆDIA) wordmark and set size correctly (duration: 01m 06s)
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:43 James_F: Manually purged https://en.wikipedia.org/images/mobile/copyright/wikipedia-wordmark-la.svg and .png from Varnish for [[phab:T240728|T240728]]
* 00:41 jforrester@deploy1001: Synchronized static/images/mobile/copyright/: [[phab:T240728|T240728]] Sync logo images (duration: 01m 04s)
* 00:40 mutante: mw1351 through mw1355 - initial puppet runs - new appservers
* 00:36 niharika29@deploy1001: Synchronized static/images/mobile/copyright/: Remove unnecessary id from wordmark (duration: 01m 03s)
* 00:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adjust MT Threshold for Assamese to 70% - [[phab:T245509|T245509]] (duration: 01m 04s)
* 00:24 niharika29@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikimediaEvents/: Follow up on authevents statsd changes in {{Gerrit|I7612b68fe}} (duration: 01m 03s)
* 00:21 niharika29@deploy1001: Synchronized wmf-config/logging.php: Update authmanager-statsd channel name (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|8c77e9e915}} to {{Gerrit|52c68911c6}}, config revision is {{Gerrit|561ae21f77}}
* 00:10 niharika29@deploy1001: Synchronized wmf-config/logging.php: Make the logstash and authmanager-statsd Monolog handlers compatible (duration: 01m 04s)
* 00:08 mutante: creating mcrouter certs for mw1350


== 2020-02-18 ==
== 2021-10-22 ==
* 23:56 mutante: mw1349 - scap pull
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1349.eqiad.wmnet
* 20:57 bblack: re-pooling eqiad in DNS
* 23:34 maryum: running reindex on mwmaint1002 - [[phab:T194448|T194448]]
* 20:54 legoktm: <XioNoX> I disabled the interface on cr1, going to re-enabled the active on on cr2
* 23:28 maryum: running reindex for wikimedia wikis
* 20:48 legoktm: bblack has temporarily depooled eqiad https://gerrit.wikimedia.org/r/733043
* 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:41 XioNoX: disable sessions to equinix eqiad IXP
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2151.wmnet
* 19:17 urbanecm: Start server-side upload of 1 video file ([[phab:T294134|T294134]])
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2150.wmnet
* 15:06 jbond: upload puppetboard_3.1.0-1_all.deb to ullseye-wikimedia
* 23:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:42 ema: deployment-cache-upload06: restart varnish-frontend, package got upgraded to 6.0.8 [[phab:T294116|T294116]]
* 22:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (extra sync for [[phab:T236104|T236104]]) (duration: 01m 04s)
* 13:30 jbond: upload python3-pypuppetdb_2.4.0-1_all.deb to bullseye
* 22:54 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (duration: 01m 03s)
* 10:46 jbond: upload cas_6.4.2-1+wmf10u1
* 22:52 chaomodus: completed upgrading Netbox to 2.7.4 [[phab:T244291|T244291]]
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 22:51 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3) (duration: 00m 11s)
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 22:51 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3)
* 09:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294029|T294029]]
* 22:49 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2) (duration: 01m 19s)
* 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS buster
* 22:48 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2)
* 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (duration: 01m 19s)
* 08:27 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 22:45 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]]
* 08:24 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 22:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244185|T244185]] Raise minimum log level for 'OAuth' from DEBUG to INFO (duration: 01m 04s)
* 08:23 ema: cp3062: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 22:30 chaomodus: Upgrading Netbox to 2.7.4
* 08:00 ema: deployment-cache-text06: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 21:56 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17580 and previous config saved to /var/cache/conftool/dbconfig/20211022-055403-root.json
* 21:54 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17579 and previous config saved to /var/cache/conftool/dbconfig/20211022-053900-root.json
* 21:26 XioNoX: rollback tcp-mss clamping in eqiad/eqord
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17578 and previous config saved to /var/cache/conftool/dbconfig/20211022-052356-root.json
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T243536|T243536]]
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17577 and previous config saved to /var/cache/conftool/dbconfig/20211022-050852-root.json
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T241884|T241884]]
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17576 and previous config saved to /var/cache/conftool/dbconfig/20211022-045349-root.json
* 20:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on metawiki - [[phab:T242122|T242122]] (duration: 01m 03s)
* 04:46 marostegui_: Deploy schema change on s8 codfw - [[phab:T291719|T291719]]
* 20:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]] (duration: 07m 59s)
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17575 and previous config saved to /var/cache/conftool/dbconfig/20211022-043845-root.json
* 20:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on testwiki - [[phab:T242122|T242122]] (duration: 01m 04s)
* 02:59 ejegg: updated payments-wiki from {{Gerrit|088a8cda1e}} to {{Gerrit|6e810fb401}}
* 20:39 ppchelko@deploy1001: Started deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]]
* 20:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/libs/StatusValue.php: [[phab:T245155|T245155]] StatusValue: Fix __toString() to not choke on special parameters (duration: 01m 04s)
* 20:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.20 [[phab:T233868|T233868]]
* 19:52 jforrester@deploy1001: Finished scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]] (duration: 61m 01s)
* 19:41 papaul: shutting down dns2001 for 10G card troubleshooting
* 19:30 James_F: Running `foreachwiki sql.php php-1.35.0-wmf.19/maintenance/archives/patch-watchlist_expiry.sql` for [[phab:T244631|T244631]]
* 18:51 jforrester@deploy1001: Started scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]]
* 18:49 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.18 (duration: 15m 29s)
* 18:25 James_F: Running `scap prep` for 1.35.0-wmf.20 ref. [[phab:T233868|T233868]]
* 18:01 James_F: 1.35.0-wmf.20 was branched at {{Gerrit|c664b4f1b933d110bd69f074c399695bd6b17d13}} for [[phab:T233868|T233868]]
* 18:01 marxarelli: completed promotion of 1.35.0-wmf.19 to all wikis ([[phab:T233867|T233867]])
* 17:52 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Re-roll all wikis to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 17:47 marxarelli: re-rolling wmf.19 to all wikis ([[phab:T233867|T233867]]) with eyes particularly on ([[phab:T245202|T245202]])
* 17:28 bblack: cp3 (esams edge) - revert GRE MTU mitigations - [[phab:T232602|T232602]]
* 17:00 papaul: restting ps1-a8-codfw see [[phab:T245164|T245164]]
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:08 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:02 ottomata: deploying new 'canary' and 'production' releases for eventgate-main.  (These releases use a new nodePort, and so will not be active until LVS is modified.  The old 'main' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 16:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 15:51 bblack: dns2001 - shutdown for hw/reimage work - [[phab:T242017|T242017]]
* 15:47 bblack: dns2001 - stopping bgp to drain service for hw/reimage work - [[phab:T242017|T242017]]
* 15:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:36 jynus: stopping db1140:s3 instance
* 15:35 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:08 vgutierrez@puppetmaster1001: conftool action : set/weight=100; selector: dc=eqiad,cluster=cache_text,service=ats-be,name=cp1089.eqiad.wmnet
* 15:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:56 bblack: esams repooled in DNS
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:54 ottomata: deploying new 'canary' and 'production' releases for eventgate-analytics.  (These releases use a new nodePort, and so will not be active until LVS is modified.  The old 'analytics' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:39 XioNoX: remove cr2-esams VRRP handicap - [[phab:T243080|T243080]]
* 14:34 XioNoX: restore default esams-eqiad link cost - [[phab:T243080|T243080]]
* 14:33 XioNoX: re-enable cr2-esams BGP transit/peering - [[phab:T243080|T243080]]
* 14:31 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 14:29 XioNoX: re-disable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/DiscussionTools: [[gerrit:572882{{!}}wmf.18: Add config option and query parameter to control loading]] (duration: 01m 11s)
* 14:02 cdanis: depool esams
* 14:01 XioNoX: re-enable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 25 -> 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10448 and previous config saved to /var/cache/conftool/dbconfig/20200218-135525-marostegui.json
* 13:44 XioNoX: installing OS on cr2-esams:re0 - [[phab:T243080|T243080]]
* 13:39 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 13:37 XioNoX: deactivate peering/transit on cr2-esams - [[phab:T243080|T243080]]
* 13:24 XioNoX: reboot cr2-esams:re1 (backup) - [[phab:T243080|T243080]]
* 13:23 XioNoX: bump cost of eqiad-esams transport - [[phab:T243080|T243080]]
* 13:10 XioNoX: fail vrrp master to cr3-esams - [[phab:T243080|T243080]]
* 12:58 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 12:55 Amir1: EU SWAT done
* 12:53 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]], take II (duration: 01m 03s)
* 12:52 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]] (duration: 01m 04s)
* 12:45 XioNoX: remove graceful-switchover and nonstop-routing from cr2-esams - [[phab:T243080|T243080]]
* 12:36 XioNoX: push new Junos to cr2-esams:re1 (backup RE, noop) - [[phab:T243080|T243080]]
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part II (duration: 01m 03s)
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I, take II (the cache issue) (duration: 01m 04s)
* 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I (duration: 01m 06s)
* 12:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572628{{!}}Start reading for the new term store for clients up to Q1000 (T225057)]] (duration: 01m 05s)
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b193dd}}: Increase Commons linkpurge rate limit for patrollers ([[phab:T245214|T245214]]) (duration: 01m 31s)
* 11:51 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:48 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:47 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:43 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:41 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:35 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:27 jynus: reenabling prometheus exporter metadata user for prometheus1003
* 11:10 jynus: temp. disabling prometheus exporter metadata user for prometheus1003
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 15 -> 25 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10445 and previous config saved to /var/cache/conftool/dbconfig/20200218-104958-marostegui.json
* 09:27 gehel: re-enable puppet on mw* - [[phab:T222321|T222321]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10444 and previous config saved to /var/cache/conftool/dbconfig/20200218-091343-marostegui.json
* 09:09 gehel: disabling puppet on mw* to deploy apache config change - [[phab:T222321|T222321]]
* 09:07 volans: rm /var/log/exim4/paniclog on cumin1001 to clear OOM from last week error
* 08:59 marostegui: Remove wikiadmin2 grants from es1 [[phab:T243512|T243512]]
* 08:59 marostegui: Remove wikiadmin2 grants from es1
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options', diff saved to https://phabricator.wikimedia.org/P10443 and previous config saved to /var/cache/conftool/dbconfig/20200218-085713-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10442 and previous config saved to /var/cache/conftool/dbconfig/20200218-082306-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10441 and previous config saved to /var/cache/conftool/dbconfig/20200218-080952-marostegui.json
* 08:08 marostegui: Restart MySQL to pick up optimizer_switch changes - [[phab:T245489|T245489]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10440 and previous config saved to /var/cache/conftool/dbconfig/20200218-080623-marostegui.json
* 07:34 elukey: powercycle analytics1065 (crashed hours ago, no mgmt console available, no ssh)
* 06:39 marostegui: Remove wikiadmin2 from pc1007, pc1008, pc1009 and pc1010 [[phab:T243512|T243512]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 100 -> 200 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10439 and previous config saved to /var/cache/conftool/dbconfig/20200218-063819-marostegui.json
* 06:27 marostegui: Stop haproxy on dbproxy1007 - [[phab:T245385|T245385]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 and weight 10 in API for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10438 and previous config saved to /var/cache/conftool/dbconfig/20200218-062459-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-02-17 ==
== 2021-10-21 ==
* 19:56 cdanis: finish enabling TCP-MSS clamping in eqiad
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:49 cdanis: s/no-op//
* 23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 19:49 cdanis: no-op enable TCP-MSS clamping on eqord and eqiad
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:33 cdanis: no-op enable flowspec change on cr2-eqord and cr2-eqiad
* 23:37 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 18:25 elukey: restart kafka on kafka-jumbo1001 to pick up new openjdk updates
* 23:36 jforrester@deploy1002: Synchronized multiversion/: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 17:25 bblack: GRE MTU mitigations applied to esams cp hosts only - [[phab:T232602|T232602]]
* 23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:33 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 15:50 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:48 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:44 cdanis: ✔️ cdanis@icinga1001.wikimedia.org ~ 🕥☕ sudo systemctl restart ircecho
* 23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:730946{{!}}CommonSettings: Drop legacy CentralAuth config flag, never read (T277932)]] (duration: 00m 55s)
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10422 and previous config saved to /var/cache/conftool/dbconfig/20200217-143146-marostegui.json
* 23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 14:17 ema: reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb [[phab:T183146|T183146]]
* 23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 12:34 XioNoX: add test flowspec rules to cr3-knams
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 moritzm: installing postgresql-9.4 security updates
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:27 vgutierrez: reboot acmechief instances (kernel upgrade)
* 22:42 mutante: [[phab:T294038|T294038]] [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created.  . .Successfully sent email
* 10:31 jynus: dropping all databases from db1140:3313
* 21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): ' db1107 increase API weight from 10 to 15 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10420 and previous config saved to /var/cache/conftool/dbconfig/20200217-102218-marostegui.json
* 21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 10:20 vgutierrez: rolling restart of ats-tls and varnish-fe on ulsfo to enable KA between them - [[phab:T244464|T244464]]
* 20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
* 10:00 moritzm: installing Linux 4.9.210 kernels on stretch systems
* 20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 09:10 godog: correction, +100G
* 20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
* 09:09 godog: +10G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 09:06 godog: +50G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:22 marostegui: Stop haproxy on dbproxy1002 - [[phab:T245384|T245384]]
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container ([[phab:T293050|T293050]]) (duration: 00m 55s)
* 19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
* 19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
* 19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
* 19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
* 18:53 urbanecm: Deploy security patch for [[phab:T285116|T285116]] (wmf.4, wmf.5)
* 18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
* 17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
* 17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on [[phab:T294010|T294010]] (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
* 17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (2/3) (duration: 00m 54s)
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (2/3) (duration: 00m 55s)
* 15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (1/3) (duration: 00m 57s)
* 15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 54s)
* 15:12 Lucas_WMDE: my next message accidentally says 1/3 again but it’s 2/3, sorry
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 14:42 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/Wikibase.default.php: Backport: [[gerrit:732666{{!}}Enable dispatching via jobs by default (T291828)]] (duration: 00m 55s)
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client/: Backport: [[gerrit:732674{{!}}Fix ExternalUserNames service wiring for local database]] (duration: 00m 57s)
* 14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:56 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:55 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:34 volans: uploaded spicerack_1.0.6 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 11:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:13 Lucas_WMDE: UTC morning backport+config window done
* 11:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294008|T294008]]
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730848{{!}}Configure event stream for map tiles state change (T289771)]] (duration: 01m 04s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:47 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:14 jbond: mergeing refactor of P:base Gerrit:714975
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:56 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 03s)
* 08:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:26 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:25 ema: cp3062: revert vsl_space experiment [[phab:T293879|T293879]]
* 08:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1004.eqiad.wmnet with OS bullseye
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17563 and previous config saved to /var/cache/conftool/dbconfig/20211021-080330-root.json
* 07:56 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1004.eqiad.wmnet with OS bullseye
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17562 and previous config saved to /var/cache/conftool/dbconfig/20211021-074826-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17561 and previous config saved to /var/cache/conftool/dbconfig/20211021-073323-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17560 and previous config saved to /var/cache/conftool/dbconfig/20211021-071819-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17559 and previous config saved to /var/cache/conftool/dbconfig/20211021-070315-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17558 and previous config saved to /var/cache/conftool/dbconfig/20211021-064812-root.json
* 06:35 elukey: `systemctl reload nginx` on cloudelastic100[5,6] to pick up the new TLS certificate and clear alerts - [[phab:T293826|T293826]]
* 04:47 marostegui: Deploy schema change on s5 codfw - [[phab:T291719|T291719]]
* 04:37 marostegui: Deploy schema change on s6 codfw - [[phab:T291719|T291719]]
* 04:04 legoktm: restarted apache on lists1001 so it only uses new TLS cert ([[phab:T293826|T293826]])
* 03:29 eileen: civicrm revision changed from {{Gerrit|e889831012}} to {{Gerrit|733a8fceda}}, config revision is {{Gerrit|eed79486d5}}
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-02-15 ==
== 2021-10-20 ==
* 01:01 cdanis: ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕗🍺 sudo systemctl restart hive-server2.service ; sudo systemctl restart hive-metastore.service
* 23:56 thcipriani@deploy1002: Finished scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]] (duration: 11m 41s)
* 23:44 thcipriani@deploy1002: Started scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]]
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace [[phab:T291018|T291018]] (duration: 01m 02s)
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace [[phab:T291018|T291018]] (duration: 01m 04s)
* 22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
* 22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
* 21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
* 21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
* 21:50 dancy: Testing a series of one-file scap sync-file runs
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9cf996a38d82fdd67e600a5a951e88423957e8d}}: Promote Growth features out of darkmode on several wikis  ([[phab:T291826|T291826]], [[phab:T255037|T255037]], [[phab:T287878|T287878]]) (duration: 01m 04s)
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:38 eileen: civicrm revision changed from {{Gerrit|9b5e0d015b}} to {{Gerrit|e889831012}}, config revision is {{Gerrit|eed79486d5}}
* 20:25 legoktm: uploaded php7.4 on buster to apt.wm.o ([[phab:T293449|T293449]])
* 19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
* 19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
* 19:09 mutante: disabling puppet on mw* for a minute to deploy a change
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org -  [[phab:T293810|T293810]]
* 17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
* 17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - [[phab:T293860|T293860]] (duration: 01m 03s)
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - [[phab:T293895|T293895]] (duration: 01m 03s)
* 16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - [[phab:T293894|T293894]] (duration: 01m 09s)
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
* 16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
* 16:13 jbond: upload cas_6.4.2-1_amd64.deb
* 15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:57 moritzm: installing modsecurity-crs security updates on Buster
* 14:48 moritzm: installing xmlgraphics-commons security updates on Buster
* 14:46 moritzm: installing irssi security updates on Buster
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:35 moritzm: installing commons-io security updates on Buster
* 14:27 ema: cp3062: test higher vsl_space values [[phab:T293879|T293879]]
* 14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:12 moritzm: installing ruby2.3 security updates
* 13:40 moritzm: installing apache2 security updates on buster
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 01m 02s)
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
* 12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M [[phab:T293879|T293879]] - varnish restart needed
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:02 urbanecm@deploy1002: Finished scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]]) (duration: 25m 19s)
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
* 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:37 urbanecm@deploy1002: Started scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]])
* 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
* 11:21 moritzm: installing ffmpeg security updates
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e520fc57411bb19123766192cd636396ea6fc59d}}: GrowthExperiments: Add campaign pattern for enwiki ([[phab:T293699|T293699]]) (duration: 01m 22s)
* 11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
* 10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
* 07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
* 06:35 marostegui: Upgrade db1106
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
* 06:31 dcausse: restarting blazegraph on wdqs1012
* 06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
* 06:21 marostegui: Depool clouddb1013 for upgrade
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
* 06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
* 06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:05 XioNoX: put transport link between ulsfo and eqsin in service - [[phab:T273308|T273308]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
* 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis ([[phab:T288848|T288848]]) (duration: 01m 05s)
* 01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 tgr: west coast evening deploys done


== 2020-02-14 ==
== 2021-10-19 ==
* 23:42 XenoRyet: updated civicrm from {{Gerrit|cf86495d44}} to {{Gerrit|8c77e9e915}}
* 23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103{{!}}Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s)
* 21:01 volker-e@deploy1001: Finished deploy [design/style-guide@1928c00]: Deploy design/style-guide:  (duration: 00m 09s)
* 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:01 volker-e@deploy1001: Started deploy [design/style-guide@1928c00]: Deploy design/style-guide:
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent some logspam [[phab:T245280|T245280]] (duration: 01m 05s)
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053{{!}}ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s)
* 19:27 XenoRyet: updated civicrm from {{Gerrit|55b2afb6eb}} to {{Gerrit|cf86495d44}}
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase: [[phab:T245062|T245062]] Prevent invalid term languages from cached PrefetchingTermLookup (duration: 01m 09s)
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 jforrester@deploy1001: Unlocked for deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (duration: 03m 05s)
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565{{!}}Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s)
* 17:33 jforrester@deploy1001: Locking from deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (planned duration: 60m 00s)
* 23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953{{!}}Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s)
* 16:11 moritzm: installing git-lfs updates from Buster 10.3 point update
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:55 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb10u1 to apt.wikimedia.org
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:55 bblack: (log(n))
* 23:13 tgr@deploy1002: Synchronized static: Config: [[gerrit:731231{{!}}Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s)
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10414 and previous config saved to /var/cache/conftool/dbconfig/20200214-155443-marostegui.json
* 21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete {{!}} fixed Icinga alert:  RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: [[phab:T165885|T165885]]
* 15:52 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb9u1 to apt.wikimedia.org
* 21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
* 15:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Resync initialisesetting to try and pick up previoiusly deployed cirrus query routing changes (duration: 01m 05s)
* 20:56 ejegg: updated payments-wiki from {{Gerrit|0f48acea49}} to {{Gerrit|30e596903d}}
* 15:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 15:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: {{Gerrit|a84a675}}: {{Gerrit|3231578}}: MediaSearch backports ([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 15:32 effie: restart mc-gp* for updates
* 18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: {{Gerrit|694580a}}: {{Gerrit|c02e301}}: MediaSearch backports([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 15:17 bd808: Toil reduction: !log messages now work from the SRE team's Freenode channel.
* 18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 13:50 gehel: restart relforge for JVM upgrade - [[phab:T245120|T245120]]
* 18:30 foks: deleting 1 more email with deleteUserEmail.php
* 10:35 vgutierrez: revert ats 8.0.6-rc0 experiment on cp40[26,32]
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1476a2d93}}: {{Gerrit|dd8393c1a0}}: foundationwiki: Restrict sensitive namespaces to editor group ([[phab:T205350|T205350]]) (duration: 01m 03s)
* 10:14 vgutierrez: rolling restart of ats-be to enable TLSv1.3 against origin servers - [[phab:T170567|T170567]]
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10409 and previous config saved to /var/cache/conftool/dbconfig/20200214-093456-marostegui.json
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a2893c7190e615a247674dbf7f87348bf43b91c}}: Enable topic subscriptions as a beta feature on all remaining projects ([[phab:T287802|T287802]]) (duration: 01m 04s)
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (2/2) (duration: 01m 06s)
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (1/2) (duration: 01m 05s)
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 09:25 volans: manually absented /usr/local/bin/apt2xml on the 5 hosts with puppet disabled
* 17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 09:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 08:46 moritzm: installing 4.19.98 kernel update on Buster systems
* 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10408 and previous config saved to /var/cache/conftool/dbconfig/20200214-080600-marostegui.json
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 06:51 vgutierrez: updating puppet compiler facts
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 01:27 dpifke@deploy1001: Finished deploy [performance/navtiming@2eec00a]: (no justification provided) (duration: 00m 05s)
* 15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 01:27 dpifke@deploy1001: Started deploy [performance/navtiming@2eec00a]: (no justification provided)
* 15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245202|T245202]] cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
* 15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - [[phab:T277193|T277193]] (duration: 01m 04s)
* 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: [[phab:T245202|T245202]] cirrus: Increase the pool counter limits a bit (duration: 01m 05s)
* 15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 45m 13s)
* 13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
* 12:40 moritzm: installing aftpd security updates
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
* 12:34 marostegui: Upgrade dbstore1003
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
* 12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - [[phab:T288843|T288843]]
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
* 12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: {{Gerrit|ec0125770775c1a1a54c3b592d86d287fd9e3ad6}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 55s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
* 12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: {{Gerrit|79808a90a95dd5dac2b532b87fb7ec1a490ea0f0}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 56s)
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - [[phab:T288843|T288843]]
* 11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
* 11:46 marostegui: Upgrade db1105 (s1,s2)
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c31b04e50101a60db7ae8acae64bc031f5e1007}}: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
* 10:56 marostegui: Upgrade clouddb1021
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 10:51 moritzm: failover master in ganeti-test to ganeti2026
* 10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
* 10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
* 10:38 oblivian@deploy1002: Synchronized w/static.php: Config: [[gerrit:730182{{!}}static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:37 marostegui: Upgrade db1101 (s7,s8)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - [[phab:T247963|T247963]]
* 10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
* 10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
* 09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
* 09:37 godog: move graphite/statsd writes to graphite2003 - [[phab:T247963|T247963]]
* 09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3  # [[phab:T281169|T281169]]
* 09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # [[phab:T281169|T281169]]
* 09:19 marostegui: Stop slave on db2112 [[phab:T290865|T290865]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:03 XioNoX: push anycast tuning to all Telia transit links - [[phab:T288843|T288843]]
* 08:50 godog: point graphite.discovery.wmnet to graphite2003 - [[phab:T247963|T247963]]
* 08:40 XioNoX: push prep-work for anycast tuning to all sites - [[phab:T288843|T288843]]
* 08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
* 08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
* 08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - [[phab:T288843|T288843]]
* 08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
* 06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
* 06:06 marostegui: Upgrade dbstore1005
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
* 06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:03 marostegui: Upgrade db1184, db1178
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 05:46 marostegui: Reimage db2112 (s1 codfw master) [[phab:T290865|T290865]]
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2020-02-13 ==
== 2021-10-18 ==
* 22:13 jeh: running filesystem tests on cloudvirt1024 [[phab:T241884|T241884]]
* 23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b654980240d51fff3c6e9c48f7076d4609c2560f}}: Create an alias for the Draft namespace on hrwiki ([[phab:T291755|T291755]]) (duration: 00m 56s)
* 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:40 jbond42: refresh facts on compilers
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # [[phab:T291761|T291761]]
* 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abe777d28594da852e49ccb1c1597b2598f3e483}}: Create Rhymes namespace for thwiktionary ([[phab:T291761|T291761]]) (duration: 00m 57s)
* 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying the 'logging-external' release) (safe because eventgate-logging-external is not in use)  - [[phab:T245203|T245203]]
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests ([[phab:T288848|T288848]]) (duration: 00m 56s)
* 20:33 marxarelli: rollback to group1 due to 500 spike (2k/min) ([[phab:T233867|T233867]])
* 22:06 maryum: deployed security patch for [[phab:T293589|T293589]]
* 20:32 dduvall@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 21:23 maryum: deployed security patch for [[phab:T293556|T293556]]
* 20:30 marxarelli: varnish 500 spike. rolling back
* 21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki {{!}} Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
* 20:20 gehel: restarting blazegraph + updater on wdqs2006
* 20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
* 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.19
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/api/ApiRollback.php: [[phab:T245159|T245159]] ApiRollback: Properly deal with UserIdentity (duration: 01m 04s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245182|T245182]] ResourceLoaderSkinModule: Don't hard-deprecate wgLogoHD just now (duration: 01m 03s)
* 19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T219534|T219534]] Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)
* 19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:10 moritzm: installing e2fsprogs security updates
* 19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots ([[phab:T160122|T160122]])
* 18:48 bblack: ns1.wikimedia.org - re-routing back to authdns2001 instead of dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: {{Gerrit|ac7b4fc2ccc69589e00a42f49d18a8f6d71777f2}}: Revert 727328 ([[phab:T293554|T293554]]) (duration: 00m 56s)
* 18:38 bblack: authdns2001 - reboot - [[phab:T242017|T242017]]
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:36 bblack: ns1.wikimedia.org - re-routing from authdns2001 to dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:09 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I9d0c8af3c577}} (duration: 01m 06s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:00 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iae1f45896}} (duration: 01m 06s)
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:59 volans: downtimed mgmt in eqiad for 1h
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - [[phab:T277193|T277193]] (duration: 00m 57s)
* 17:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iae1f45896}} (duration: 01m 08s)
* 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ibfca686f681}} (duration: 01m 06s)
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:41 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iefff596955e}} (duration: 01m 08s)
* 18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group ([[phab:T293621|T293621]])
* 17:40 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iefff596955e}} (duration: 01m 06s)
* 17:51 mutante: puppet run on all bastion hosts via cumin
* 17:35 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 17:32 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
* 17:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op (code style only) deploy sync (duration: 01m 07s)
* 15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 17:09 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php No-op (code style only) deploy sync (duration: 00m 04s)
* 15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 17:09 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php No-op (code style only) deploy sync
* 15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia [[phab:T292196|T292196]]
* 16:32 robh: ps1-a8-codfw.mgmt.codfw.wmnet firmware upgraded via [[phab:T245164|T245164]]
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 16:28 papaul: rebooting elastic2043 for firmware upgrade
* 15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 16:22 gehel: canceled the restart of elastic2043 - [[phab:T243715|T243715]]
* 14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 16:21 gehel: restarting elastic2043 - [[phab:T243715|T243715]]
* 14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 16:10 _joe_: depooling/repooling mw1240
* 14:54 herron: rebuilt and uploaded kafkatee for bullseye [[phab:T292196|T292196]]
* 16:02 _joe_: pooled mw1238 again
* 14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:59 _joe_: depooling mw1238 for analysis
* 14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:42 vgutierrez: rolling restart of ats-be on esams - [[phab:T170567|T170567]]
* 14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731346{{!}}[beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361)]] (duration: 00m 56s)
* 15:38 vgutierrez: disable allow_half_open on ats-tls @ cp4031 - [[phab:T236458|T236458]]
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:27 vgutierrez: turning on TLSv1.3 between ats-be and applayer in cp30[51-52] - [[phab:T170567|T170567]]
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikibaseMediaInfo/resources/: UBN fix: Force non-value to be undefined (duration: 01m 06s)
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 vgutierrez: test TLSv1.3 between ats-be and applayer in cp3050 - [[phab:T170567|T170567]]
* 14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:47 XioNoX: re-image rpki2001 - [[phab:T244585|T244585]]
* 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 XioNoX: add routinator_0.6.4_amd64.deb to buster-wikimedia apt repo
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10405 and previous config saved to /var/cache/conftool/dbconfig/20200213-142735-marostegui.json
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (2/2) (duration: 00m 56s)
* 14:24 XioNoX: re-enable ping offload in esams - [[phab:T244584|T244584]]
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (1/2) (duration: 00m 56s)
* 13:31 XioNoX: disable ping offload in esams - [[phab:T244584|T244584]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 XioNoX: re-enable ping offload in eqiad - [[phab:T244584|T244584]]
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:731014{{!}}Unconditionally enable Wikibase dispatching via jobs (T291828)]] (duration: 00m 56s)
* 13:06 XioNoX: disable ping offload in eqiad - [[phab:T244584|T244584]]
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:03 XioNoX: re-enable ping offload in codfw - [[phab:T244584|T244584]]
* 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
* 13:00 vgutierrez: pool cp10[75,76] running buster - [[phab:T242093|T242093]]
* 12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:55 Lucas_WMDE: UTC morning backport window done
* 12:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (2/2) (duration: 00m 56s)
* 12:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (1/2) (duration: 00m 56s)
* 12:34 Amir1: EU SWAT is done
* 11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]], take II, the cache issue ([[phab:T219123|T219123]] [[phab:T225055|T225055]]) (duration: 01m 03s)
* 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
* 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]] (duration: 01m 03s)
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:29 vgutierrez: depool cp10[75,76] and reimage as buster - [[phab:T242093|T242093]]
* 11:49 marostegui: Reimage db2079 (codfw s8 master) [[phab:T290868|T290868]]
* 12:28 vgutierrez: pool cp10[77,78] running buster - [[phab:T242093|T242093]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730747{{!}}Set dispatchViaJobsAllowedClients to null everywhere (T291828)]] (duration: 00m 56s)
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571956{{!}}Revert: Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:18 XioNoX: re-image ping2001 to buster - [[phab:T244584|T244584]]
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731239{{!}}Make deduplication actually work for DispatchChangesJob (T291118)]] (duration: 00m 55s)
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1c81925}}: Create Test Custodians group at Beta Wikiversity ([[phab:T240438|T240438]]) (duration: 01m 07s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (2/2) (duration: 00m 56s)
* 12:13 XioNoX: disable ping offload in codfw
* 11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (duration: 00m 56s)
* 12:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0f035e4}}: Update wgAvailableRights declaration of autoreviewprotected ([[phab:T230103|T230103]]) (duration: 01m 03s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|176b0e8}}: Grant autopatrol to azwiki patrollers ([[phab:T244338|T244338]]) (duration: 01m 05s)
* 10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
* 11:53 vgutierrez: depool cp10[77,78] and reimage as buster - [[phab:T242093|T242093]]
* 10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:731237{{!}}Don't filter by change Id when dispatching to client wikis ()]] (duration: 00m 59s)
* 11:52 vgutierrez: pool cp10[79,80] running buster - [[phab:T242093|T242093]]
* 09:48 moritzm: installing node-tar security updates on buster
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - [[phab:T292619|T292619]]
* 11:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:38 godog: sync metrics from graphite1004 to graphite2003 - [[phab:T247963|T247963]]
* 11:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 moritzm: installing apr security updates on bullseye
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
* 11:18 vgutierrez: rolling upgrade of ATS to version 8.0.5-1wm16 fleet wide - [[phab:T244464|T244464]]
* 07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 11:16 vgutierrez: depool cp10[79,80] and reimage as buster - [[phab:T242093|T242093]]
* 07:34 elukey: depool + restart blazegraph on wdqs1013
* 11:12 ema: A:cp re-enable puppet, leave it to cron to apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 vgutierrez: upload trafficserver 8.0.5-1wm16 to apt.wm.o (buster) - [[phab:T244464|T244464]]
* 06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 vgutierrez: pool cp10[81,82] and reimage as buster - [[phab:T242093|T242093]]
* 06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:59 ema: cp4021 (cache_upload): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:49 ema: cp4027 (cache_text): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 vgutierrez: removing /root/.ssh/known_hosts in cumin1001
* 10:21 vgutierrez: pool cp10[83,84] running buster - [[phab:T242093|T242093]]
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 vgutierrez: depool cp10[83,84] and reimage as buster - [[phab:T242093|T242093]]
* 09:45 vgutierrez: pool cp10[85,86] running buster - [[phab:T242093|T242093]]
* 09:10 moritzm: installing Java security updates on elastic* and relforge*
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10403 and previous config saved to /var/cache/conftool/dbconfig/20200213-085957-marostegui.json
* 08:57 gehel: restart elasticsearch on elastic2051 - JVM upgrade
* 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:57 moritzm: installing Java security updates on Hadoop, Kafka/Jumbo, AQS and Druid canaries
* 07:57 vgutierrez: depool cp10[85,86] and reimage as buster - [[phab:T242093|T242093]]
* 07:53 moritzm: rolling restart of restbase-dev to pick up Java security update
* 07:49 vgutierrez: pool cp10[87,88] running buster - [[phab:T242093|T242093]]
* 07:49 vgutierrez: testing ATS 8.0.5-1wm16 + KA between ats-tls and varnish-fe in cp4031 - [[phab:T244464|T244464]]
* 07:47 moritzm: installing Java security updates on stat/SWAP hosts
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10402 and previous config saved to /var/cache/conftool/dbconfig/20200213-072839-marostegui.json
* 07:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:03 vgutierrez: depool cp10[87,88] and reimage as buster - [[phab:T242093|T242093]]
* 07:02 vgutierrez: pool cp10[89,90] running buster - [[phab:T242093|T242093]]
* 06:49 vgutierrez: pool cp20[02,05] running buster - [[phab:T242093|T242093]]
* 06:36 marostegui: Upgrade and compress db1087, this will generate lag on s8 on the wiki replicas - [[phab:T232446|T232446]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10401 and previous config saved to /var/cache/conftool/dbconfig/20200213-063535-marostegui.json
* 06:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1099:3318 into vslow for s8 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10400 and previous config saved to /var/cache/conftool/dbconfig/20200213-063334-marostegui.json
* 06:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10399 and previous config saved to /var/cache/conftool/dbconfig/20200213-063207-marostegui.json
* 06:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10398 and previous config saved to /var/cache/conftool/dbconfig/20200213-062642-marostegui.json
* 06:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10397 and previous config saved to /var/cache/conftool/dbconfig/20200213-062148-marostegui.json
* 06:19 vgutierrez: testing a new build of ATS 8.0.6 in cp40[26,32]
* 06:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10396 and previous config saved to /var/cache/conftool/dbconfig/20200213-061219-marostegui.json
* 06:11 vgutierrez: depool cp10[89,90] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: depool cp20[02,05] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: pool cp20[01,08] running buster - [[phab:T242093|T242093]]
* 06:02 twentyafterfour: set phabricator read-only to false
* 06:01 twentyafterfour: set phabricator read-only
* 06:00 marostegui: Start phabricator maintenance [[phab:T244566|T244566]]
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:53 marostegui: Upgrade db1128 without restarting mysql - [[phab:T244566|T244566]]
* 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:47 marostegui: Silence m3 hosts for maintenance - [[phab:T244566|T244566]]
* 05:38 vgutierrez: depool cp2008 and reimage as buster - [[phab:T242093|T242093]]
* 05:37 vgutierrez: pool cp2011 running buster - [[phab:T242093|T242093]]
* 05:35 vgutierrez: depool cp2001 and reimage as buster - [[phab:T242093|T242093]]
* 05:34 vgutierrez: pool cp2004 running buster - [[phab:T242093|T242093]]
* 05:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:09 vgutierrez: depool cp20[04,11] and reimage as buster - [[phab:T242093|T242093]]
* 03:57 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:57 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:32 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:10 twentyafterfour: no apparent problems with phabricator upgrade, all done
* 01:01 twentyafterfour: starting phabricator deploy, momentary downtime expected while apache restarts
* 00:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:45 niharika29@deploy1001: Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - [[phab:T244488|T244488]] (duration: 01m 07s)
* 00:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-12 ==
== 2021-10-16 ==
* 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
* 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:53 chaomodus: restart nagios-nrpe-service on cumin1001 after it had oomed
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:10 marxarelli: completed group1 to 1.35.0-wmf.19
* 21:00 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.19 (duration: 01m 03s)
* 20:59 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.19
* 20:49 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232563|T232563]] - Remove SERVER_SOFTWARE override (duration: 01m 03s)
* 20:39 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T72470|T72470]] - Disable wgLegacyJavaScriptGlobals on svwiki (duration: 01m 08s)
* 19:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Don't use hex escapes in the name of cawiki (duration: 01m 04s)
* 19:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243503|T243503]] [itwiki] Move assignment of 'mover' group from sysops to bureaucrats (duration: 01m 02s)
* 19:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243509|T243509]] [zh_classicalwiki] Enable new user message for auto-created accounts (duration: 01m 03s)
* 19:38 James_F: Ran mwscript maintenance/namespaceDupes.php --wiki=mywiki --fix and mwscript maintenance/namespaceDupes.php --wiki=mywiktionary --fix on mwmaint1002
* 19:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244980|T244980]] Localise $wgMetaNamespace for mywiki and mywiktionary (duration: 01m 03s)
* 19:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244205|T244205]] [newiki] Set local timezone to Kathmandu (duration: 01m 03s)
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T241883|T241883]] [fywiktionary] Set a local wgSitename (duration: 01m 03s)
* 19:12 jforrester@deploy1001: Synchronized wmf-config/throttle-analyze.php: Replace deprecated IP class with IPUtils (no-op sync) (duration: 01m 03s)
* 18:31 mutante: irc2001 - manually run the "$<nowiki>{</nowiki>v6_token_cmd<nowiki>}</nowiki> && $<nowiki>{</nowiki>v6_flush_dyn_cmd<nowiki>}</nowiki>" commands from interface::add_ip6_mapped to debug 'Interface::Add_ip6_mapped[main]/Augeas[ens5_v6_token]: Could not evaluate: Saving failed' but it does not reproduce the puppet error ... ([[phab:T244719|T244719]])
* 17:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/pager/IndexPager.php: [[phab:T244941|T244941]] IndexPager: Cast properties passed to implode to arrays (duration: 01m 03s)
* 17:27 jeh: upgrade RAID firmware on cloudvirt1024 to 25.5.6.0009 [[phab:T241884|T241884]]
* 17:22 bblack: ns1.wikimedia.org - re-route back to original authdns2001 destination
* 17:11 brennen: restarting jenkins for updates
* 17:09 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 17:01 vgutierrez: rolling back cp4026 and cp4032 to trafficserver 8.0.5-1wm15
* 17:00 vgutierrez: depool cp40[26,32]
* 16:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:52 vgutierrez: pool cp20[06,14] running buster - [[phab:T242093|T242093]]
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 moritzm: installing openjpeg2 security updates
* 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 vgutierrez: Enable KA and disable parent proxies on cp4031 - [[phab:T244464|T244464]]
* 15:50 vgutierrez: depool cp20[06,14] and reimage as buster - [[phab:T242093|T242093]]
* 15:49 volans: spicerack upgraded to 0.0.30-1 on both cumin hosts
* 15:48 vgutierrez: pool cp20[07,17] running buster - [[phab:T242093|T242093]]
* 15:46 bblack: authdns2001 - shutting down for hardware work - [[phab:T242017|T242017]]
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:39 jeh: clearing foreign drive RAID configuration on cloudvirt1024 [[phab:T241884|T241884]]
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga - [[phab:T244958|T244958]]
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga -
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 jeh: upgrade BIOS firmware on cloudvirt1024 to 2.4.8 [[phab:T241884|T241884]]
* 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 vgutierrez: depool cp20[07,17] and reimage as buster - [[phab:T242093|T242093]]
* 14:34 XioNoX: repool eqsin
* 14:31 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 14:00 vgutierrez: pool cp20[10,18] running buster - [[phab:T242093|T242093]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10393 and previous config saved to /var/cache/conftool/dbconfig/20200212-135514-marostegui.json
* 13:39 akosiaris: revert sessionstore on mw1331, mw1348 so that it times out instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:36 XioNoX: re-enable transit/peering on cr1-eqsin - [[phab:T244944|T244944]]
* 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:23 akosiaris: mangle sessionstore on mw1331, mw1348 so that it timesout instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 XioNoX: cr1-eqsin RE failover (final) - [[phab:T244944|T244944]]
* 13:21 marostegui: Restart wikibugs as phab comments aren't showing up on irc - [[phab:T241109|T241109]]
* 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:18 jynus: setting up db1140 under maintenance (upgrade, reboot, disable alerts)
* 13:15 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 13:10 moritzm: upgrading debdeploy fleet-wide to 0.0.99.13
* 13:08 moritzm: uploaded libapache2-mod-auth-cas 1.2-1~deb8u1 for jessie-wikimedia to apt.wikimedia.org
* 13:05 vgutierrez: depool cp20[10,18] and reimage as buster - [[phab:T242093|T242093]]
* 13:05 vgutierrez: pool cp20[12,20] running buster - [[phab:T242093|T242093]]
* 12:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 XioNoX: cr1-eqsin RE failover - [[phab:T244944|T244944]]
* 12:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:35 vgutierrez: depool cp20[12,20] and reimage as buster - [[phab:T242093|T242093]]
* 12:34 vgutierrez: pool cp20[13,22] running buster - [[phab:T242093|T242093]]
* 12:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]], take II, the cache issue (duration: 01m 03s)
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 12:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}571412{{!}}Enable ContentTranslation out of beta in bs and mk WPs (T244139, T244140)]] (duration: 01m 15s)
* 12:08 vgutierrez: depool cp2013 and reimage as buster - [[phab:T242093|T242093]]
* 12:06 vgutierrez: pool cp2016 running buster - [[phab:T242093|T242093]]
* 12:01 vgutierrez: depool cp20[16,22] and reimage as buster - [[phab:T242093|T242093]]
* 11:57 vgutierrez: pool cp20[19,24] running buster - [[phab:T242093|T242093]]
* 11:53 akosiaris: mangle sessionstore on mw1331 so that it is unreachable. Testing for [[phab:T243106|T243106]]
* 11:49 vgutierrez: repooling cp40[26,32]
* 11:39 vgutierrez: pool cp3050 running buster - [[phab:T242093|T242093]]
* 11:37 vgutierrez: depooling cp[4026,4032]
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 vgutierrez: depool cp2024 and reimage as buster - [[phab:T242093|T242093]]
* 11:17 vgutierrez: pool cp2025 running buster - [[phab:T242093|T242093]]
* 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 vgutierrez: depool cp2016 and reimage as buster - [[phab:T242093|T242093]]
* 11:14 vgutierrez: pool cp2019 running buster - [[phab:T242093|T242093]]
* 11:11 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 11:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 vgutierrez: depool cp3050 and reimage as buster - [[phab:T242093|T242093]]
* 10:49 vgutierrez: pool cp30[51,52] running buster - [[phab:T242093|T242093]]
* 10:45 vgutierrez: depool cp20[19,25] and reimage as buster - [[phab:T242093|T242093]]
* 10:42 vgutierrez: pool cp2026 running buster - [[phab:T242093|T242093]]
* 10:36 vgutierrez: pool cp2023 running buster - [[phab:T242093|T242093]]
* 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:34 moritzm: bouncing ferm on ganeti1016, failed to start after boot
* 10:32 vgutierrez: Enable KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 10:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:12 vgutierrez: testing trafficserver 8.0.6-rc0 in cp40[26,32]
* 10:06 vgutierrez: depool cp20[23,26] and reimage as buster - [[phab:T242093|T242093]]
* 10:01 vgutierrez: depool cp30[51-52] and reimage as buster - [[phab:T242093|T242093]]
* 09:38 ema: cp: rolling ats-tls-restart to enable analytics logging [[phab:T237993|T237993]]
* 09:26 ema: cp4027: ats-tls-restart to enable analytics logging to pipe [[phab:T237993|T237993]]
* 09:25 moritzm: rolling restart of cassandra on restbase-dev to pick up Java security updates
* 09:17 marostegui: Failover m2 master dbproxy from dbproxy1007 to dbproxy1013 - [[phab:T202367|T202367]]
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:11 marostegui: Upgrade and reboot dbproxy1013 before making it master - [[phab:T202367|T202367]]
* 08:55 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@9bbbb58]: (no justification provided) (duration: 00m 05s)
* 08:46 phedenskog@deploy1001: Started deploy [performance/navtiming@9bbbb58]: (no justification provided)
* 08:38 marostegui: Restart wikibugs as it doesn't show phab comments on irc - [[phab:T241109|T241109]]
* 08:21 moritzm: installing mesa security updates
* 07:28 vgutierrez: pool cp30[53-54] running buster - [[phab:T242093|T242093]]
* 07:18 oblivian@puppetmaster1001: conftool action : set/weight=30; selector: dc=eqiad,pool=appserver,name=mw132[3-4].*
* 07:16 oblivian@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,pool=appserver,service=nginx,name=mw12[3-5].*
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 20 for  10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10391 and previous config saved to /var/cache/conftool/dbconfig/20200212-070250-marostegui.json
* 06:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:46 marostegui: Redact ngwikimedia on db1124:3313 and db2094:3313 [[phab:T240772|T240772]]
* 06:22 vgutierrez: depool cp30[53-54] and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 01:48 XioNoX: disabling peering session on cr1-eqsin (they're flapping otherwise)
* 00:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/page/ImageHistoryPseudoPager.php: [[phab:T244937|T244937]] ImageHistoryPseudoPager: Update doQuery() for IndexPager changes (duration: 01m 03s)
* 00:38 XioNoX: reboot cr1-eqsin
* 00:33 XioNoX: commit full on cr1-eqsin - [[phab:T243080|T243080]]
* 00:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: rm wgKartographerIconServer (duration: 01m 02s)
* 00:20 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: rm wgKartographerIconServer (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|ee9edf8137}} to {{Gerrit|55b2afb6eb}}, config revision is {{Gerrit|561ae21f77}}


== 2020-02-11 ==
== 2021-10-15 ==
* 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - [[phab:T243080|T243080]]
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - [[phab:T243080|T243080]]
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:45 cdanis: repool eqiad
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:36 bblack: re-pooling all cp10xx in eqiad
* 22:34 mutante: apt2001 - upgraded nginx
* 21:32 XioNoX: switchover RE mastership on cr1-eqsin - [[phab:T243080|T243080]]
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:14 robh: cp1067 powered back into service post firmware update via [[phab:T243167|T243167]]
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:11 cdanis: depool eqiad
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via [[phab:T243167|T243167]]
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:51 XioNoX: reboot backup RE on cr1-eqsin - [[phab:T243080|T243080]]
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:38 robh: depooling cp108[45] for firmware update via [[phab:T243167|T243167]]
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 02s)
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 03s)
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:08 XioNoX: depool eqsin for router upgrade - [[phab:T243080|T243080]]
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 04s)
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 ([[phab:T233867|T233867]])
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat ([[phab:T240771|T240771]])
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:43 twentyafterfour: getting ready to deploy wmf.18 refs  [[phab:T233866|T233866]]
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:42 greg-g: restarting stashbot
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - [[phab:T242017|T242017]]
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:33 Urbanecm: Create ngwikimedia is done ([[phab:T240771|T240771]])
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 03s)
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia ([[phab:T240771|T240771]])
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:57 bblack: reboot dns2002 post-reimaging
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 vgutierrez: Disable KA on cp4031 - [[phab:T244464|T244464]]
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 16:49 vgutierrez: pool cp3055 running buster - [[phab:T242093|T242093]]
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:43 vgutierrez: repooling cp4031
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:38 vgutierrez: depooling cp4031 for some KA tests
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:25 vgutierrez: pool cp3056 running buster - [[phab:T242093|T242093]]
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - [[phab:T242017|T242017]]
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 16:21 bblack: dns2002 - stopping bird adverts to depool service for [[phab:T242017|T242017]]
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:20 bblack: dns2002 - downtimed in icinga for [[phab:T242017|T242017]]
* 06:20 urbanecm: Start server-side upload for 1 video file
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:38 vgutierrez: depool cp3056 and reimage as buster - [[phab:T242093|T242093]]
* 00:07 brennen: end of UTC late backport & config training window
* 15:36 vgutierrez: pool cp3058 running buster - [[phab:T242093|T242093]]
* 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - [[phab:T242122|T242122]] (duration: 01m 08s)
* 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 vgutierrez: depool cp3055 and reimage as buster - [[phab:T242093|T242093]]
* 14:56 vgutierrez: pool cp3057 running buster - [[phab:T242093|T242093]]
* 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
* 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
* 14:20 vgutierrez: restart varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - [[phab:T242093|T242093]]
* 13:52 vgutierrez: pool cp3059 and cp3060 running buster - [[phab:T242093|T242093]]
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 Amir1: EU SWAT is done
* 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]], take II, cache (duration: 01m 06s)
* 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]] (duration: 01m 05s)
* 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]], Second round, cache issue (duration: 01m 07s)
* 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]] (duration: 01m 11s)
* 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - [[phab:T242093|T242093]]
* 11:59 vgutierrez: repool cp3061 and cp3062 running buster - [[phab:T242093|T242093]]
* 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 vgutierrez: depool cp3062 and reimage as buster - [[phab:T242093|T242093]]
* 10:54 vgutierrez: repool cp3064 running buster - [[phab:T242093|T242093]]
* 10:51 vgutierrez: depool cp3061 and reimage as buster - [[phab:T242093|T242093]]
* 10:50 vgutierrez: repool cp5006 and cp3063 running buster - [[phab:T242093|T242093]]
* 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - [[phab:T242093|T242093]]
* 09:52 vgutierrez: depool cp5006 and reimage as buster - [[phab:T242093|T242093]]
* 09:52 vgutierrez: pool cp5007 running buster - [[phab:T242093|T242093]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
* 08:25 marostegui: Upgrade db1095:3312, db1095:3313
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
* 07:57 akosiaris: [[phab:T242705|T242705]] systemctl stop uwsgi-ores on ores2001.
* 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
* 07:47 marostegui: Upgrade es1013 - [[phab:T239791|T239791]]
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
* 07:23 vgutierrez: depool cp5007 and reimage as buster - [[phab:T242093|T242093]]
* 07:22 vgutierrez: pool cp5001 and cp5008 running buster - [[phab:T242093|T242093]]
* 07:21 marostegui: Remove partitions from db2086:3318 - [[phab:T239453|T239453]]
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
* 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:59 marostegui: Stop haproxy on dbproxy1001 - [[phab:T244463|T244463]]
* 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:48 marostegui: Remove grants in m1 for dbproxy1001 - [[phab:T231280|T231280]]
* 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - [[phab:T202367|T202367]]
* 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
* 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
* 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)


== 2020-02-10 ==
== 2021-10-14 ==
* 23:30 robh: cp108[23] returned to service via [[phab:T243167|T243167]]
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:28 legoktm: restarting zuul
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 04s)
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 07s)
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via [[phab:T243167|T243167]]
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silent  --id 2164
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via [[phab:T243167|T243167]]
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 21:43 arlolra: Updated Parsoid to {{Gerrit|612106d2}} ([[phab:T244412|T244412]], [[phab:T244413|T244413]], [[phab:T242746|T242746]], [[phab:T235273|T235273]], [[phab:T235307|T235307]], [[phab:T238845|T238845]], [[phab:T204618|T204618]], [[phab:T240054|T240054]])
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 21:38 robh: cp1075 & cp1076 offline for bios updates per [[phab:T243167|T243167]]
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 21:36 robh: cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging. Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}} (duration: 10m 26s)
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}}
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]] (duration: 12m 18s)
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]]
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 11s)
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 15s)
* 22:31 mutante: depooling mw1452 for testig
* 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570393]] Config: Session Store: Switch group0 and group1 to kask-session [[phab:T243106|T243106]] (duration: 01m 06s)
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group ([[phab:T244508|T244508]])
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242122|T242122]] Load new EventStreamConfig extension if so configured (duration: 01m 06s)
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242122|T242122]] Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 05s)
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs [[phab:T233867|T233867]]
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs [[phab:T233866|T233866]]
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244561|T244561]] Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.refs [[phab:T281168|T281168]]
* 16:48 moritzm: installing libexif security updates on jessie
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - [[phab:T242093|T242093]]
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:45 XioNoX: push outbound flowspec support to core routers
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after first day of 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10366 and previous config saved to /var/cache/conftool/dbconfig/20200210-154552-marostegui.json
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 15:33 godog: roll restart cassandra on session* to apply logging changes - [[phab:T242585|T242585]]
* 18:41 urbanecm: UTC evening B&C done
* 15:23 moritzm: uploading debdeploy 0.0.99.13 to apt.wikimedia.org
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 15:22 godog: roll restart cassandra on restbase* to apply logging changes - [[phab:T242585|T242585]]
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 15:06 marostegui: Reload haproxy on dbproxy1017 and dbproxy1017 - [[phab:T244209|T244209]]
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 15:04 twentyafterfour@deploy1001: Finished scap: full scap sync prior to wmf.18 rollout (duration: 20m 13s)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:04 godog: roll restart cassandra on maps* to apply logging changes - [[phab:T242585|T242585]]
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:03 vgutierrez: rolling restart of ats-tls - [[phab:T240950|T240950]]
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:00 marostegui: Restart mysql on m5 master (wikitech will go down) - [[phab:T244209|T244209]]
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:52 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 14:46 vgutierrez: depool cp5002 and cp5009 and reimage as buster - [[phab:T242093|T242093]]
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:44 twentyafterfour@deploy1001: Started scap: full scap sync prior to wmf.18 rollout
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 14:42 vgutierrez: repool cp5003 and cp5010 running buster - [[phab:T242093|T242093]]
* 17:42 rzl: depool mw1452 for training
* 14:41 marostegui: Full-upgrade db1133 (without restarting mysql) - [[phab:T244209|T244209]]
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:40 twentyafterfour: MediaWiki Train: Running a full scap to prepare for moving forward to 1.35.0-wmf.18 ( [[phab:T233866|T233866]] )
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:32 marostegui: Downtime m5 hosts for the upcoming maintenance - [[phab:T244209|T244209]]
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 14:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 14:11 XioNoX: remove TCP-MSS clamping on cr3-knams
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:48 vgutierrez: depool cp5003 and reimage as buster - [[phab:T242093|T242093]]
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 13:47 vgutierrez: pooling cp5004 with buster - [[phab:T242093|T242093]]
* 16:33 moritzm: installing node-ansi-regex security updates
* 13:46 vgutierrez: depool cp5010 and reimage as buster - [[phab:T242093|T242093]]
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 13:45 vgutierrez: pooling cp5011 with buster - [[phab:T242093|T242093]]
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 13:28 godog: roll restart cassandra on aqs to apply logging changes - [[phab:T242585|T242585]]
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 13:03 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase: [[gerrit:570911{{!}}Revert "wbterms: Set default for the term store to read new"]] ([[phab:T244529|T244529]]) (duration: 01m 00s)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 13:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 12:58 Urbanecm: EU SWAT is done
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 12:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 58s)
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 59s)
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 12:49 urbanecm@deploy1001: Finished scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]]) (duration: 20m 18s)
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 12:30 vgutierrez: depool cp5004 and reimage as buster - [[phab:T242093|T242093]]
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:29 vgutierrez: pooling cp5005 with buster - [[phab:T242093|T242093]]
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:28 urbanecm@deploy1001: Started scap: SWAT: {{Gerrit|799224f}}: {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]])
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 12:23 vgutierrez: pooling ncredir1001 with buster - [[phab:T243391|T243391]]
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 12:18 _joe_: running puppet, scap pull on mwdebug1001
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 12:17 vgutierrez: upload trafficserver 8.0.5-1wm15 to apt.wm.o (buster) - [[phab:T244538|T244538]]
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:08 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 12:06 vgutierrez: testing ats 8.0.5-1-wm15 on cp4032 - [[phab:T244538|T244538]]
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|014405a}}: Add throttle rules for OSU Editathon and workshop for cawiki, remove expired ones ([[phab:T244608|T244608]], [[phab:T244645|T244645]]) (duration: 01m 03s)
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 11:57 vgutierrez: depool ncredir1001 and reimage as buster - [[phab:T243391|T243391]]
* 14:23 moritzm: installing krb5 security updates on KDCs
* 11:57 vgutierrez: pooling ncredir1002 with buster - [[phab:T243391|T243391]]
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 11:43 vgutierrez: pooling cp4027 with buster - [[phab:T242093|T242093]]
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 11:38 vgutierrez: depool ncredir1002 and reimage as buster - [[phab:T243391|T243391]]
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 11:22 vgutierrez: depooling cp5011 and cp5005 & reimage as buster - [[phab:T242093|T242093]]
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:07 vgutierrez: depool cp4027 & reimage as buster - [[phab:T242093|T242093]]
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:07 vgutierrez: pooling ncredir2001 with buster - [[phab:T243391|T243391]]
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 11:03 vgutierrez: pooling cp4028 with buster - [[phab:T242093|T242093]]
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:47 godog: remove old logs from /var/log/swift on swift hsots
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:31 vgutierrez: depool ncredir2001 and reimage as buster - [[phab:T243391|T243391]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:26 vgutierrez: depool cp4028 & reimage as buster - [[phab:T242093|T242093]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:14 moritzm: installing sudo security updates for buster
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:53 vgutierrez: pooling cp4029 with buster - [[phab:T242093|T242093]]
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 1 to 5 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10364 and previous config saved to /var/cache/conftool/dbconfig/20200210-084446-marostegui.json
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 08:43 vgutierrez: pooling ncredir2002 with buster - [[phab:T243391|T243391]]
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:34 effie: rolling restart php-fpm on labweb[1001-1002].wikimedia.org,mw*.eqiad.wmnet,scandium.eqiad.wmnet, wtp[1025-1048].eqiad.wmnet
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:32 effie: update php-apcu on eqiad - [[phab:T236800|T236800]]
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 08:29 effie: rolling restart php-fpm on cloudweb2001-dev.wikimedia.org,mw[2135-2147,2150-2212,2214-2290].codfw.wmnet,wtp[2001-2020].codfw.wmnet
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 08:23 effie: update php-apcu on codfw - [[phab:T236800|T236800]]
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 07:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 07:54 moritzm: updating d-i netinst image for Stretch 9.12 point release (which bumped the kernel ABI)
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 07:29 moritzm: updating d-i netinst image for Buster 10.3 point release (which bumped the kernel ABI)
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 07:09 elukey: restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5)
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Place db1107 - MariaDB 10.4 on s1 with minimal weight - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10363 and previous config saved to /var/cache/conftool/dbconfig/20200210-070140-marostegui.json
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 06:55 vgutierrez: depool ncredir2002 and reimage as buster - [[phab:T243391|T243391]]
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019', diff saved to https://phabricator.wikimedia.org/P10362 and previous config saved to /var/cache/conftool/dbconfig/20200210-065326-marostegui.json
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10361 and previous config saved to /var/cache/conftool/dbconfig/20200210-065135-marostegui.json
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:47 vgutierrez: depool cp4029 & reimage as buster - [[phab:T242093|T242093]]
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019', diff saved to https://phabricator.wikimedia.org/P10360 and previous config saved to /var/cache/conftool/dbconfig/20200210-064553-marostegui.json
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10359 and previous config saved to /var/cache/conftool/dbconfig/20200210-064458-marostegui.json
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 06:39 marostegui: Compress db1124:3318 - this will generate lag on s8 wiki replicas - [[phab:T232446|T232446]]
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10358 and previous config saved to /var/cache/conftool/dbconfig/20200210-063716-marostegui.json
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 06:23 marostegui: Remove partitions from db1099:3311, db1099:3318 [[phab:T239453|T239453]]
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool  db1099:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10357 and previous config saved to /var/cache/conftool/dbconfig/20200210-062112-marostegui.json
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10356 and previous config saved to /var/cache/conftool/dbconfig/20200210-061822-marostegui.json
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10355 and previous config saved to /var/cache/conftool/dbconfig/20200210-061656-marostegui.json
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2020-02-09 ==
== 2021-10-13 ==
* 05:11 cdanis: [[phab:T238305|T238305]] hardreset cp3051
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 21:47 foks: removing 8 files for legal compliance
* 21:03 foks: removing 2 files for legal compliance
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:48 moritzm: reverted to clean package state on deneb
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2020-02-08 ==
== 2021-10-12 ==
* 19:12 _joe_: set cpufreq governor to performance on mw1328
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:04 _joe_: restarted php7.2-fpm on mw1332
* 23:16 urbanecm: UTC late B&C window done
* 16:53 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 12.24.27.50
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 16:47 gjg@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Editathon in Charolette (duration: 00m 58s)
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 00:05 Jeff_Green: switched payments.wikimedia.org to codfw datacenter due to [[phab:T244610|T244610]]
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 17:12 moritzm: installing rsync bugfix updates
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:34 urbanecm: UTC morning B&C window done
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 07:22 moritzm: installing RT security updates
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2020-02-07 ==
== 2021-10-11 ==
* 22:20 jeh: ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 20:47 mutante: OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs ([[phab:T224576|T224576]])
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 20:42 jeh: ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 20:32 mutante: ganeti: attempting to reinstall install1003 which failed last time
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 17:36 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs [[phab:T233866|T233866]] (duration: 01m 03s)
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 17:32 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs [[phab:T233866|T233866]] (duration: 01m 02s)
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 17:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs [[phab:T233866|T233866]]
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 17:19 marostegui: Start MySQL on es1019 after onsite maintenance [[phab:T243963|T243963]]
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:53 moritzm: install apache security updates on buster
* 16:38 filippo@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 16:13 XioNoX: remove MSS clamping from eqiad/eqord/knams/esams
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 16:05 andrew@deploy1001: Finished deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]] (duration: 03m 45s)
* 12:04 moritzm: install apache security updates on bullseye
* 16:04 vgutierrez: pooling cp4030 with buster - [[phab:T242093|T242093]]
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 16:03 bblack: removing GRE MTU mitigations from cp[135]xxx - [[phab:T232602|T232602]]
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 16:01 andrew@deploy1001: Started deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]]
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 15:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 15:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 15:25 vgutierrez: depool & reimage cp4030 as buster - [[phab:T242093|T242093]]
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 15:21 vgutierrez: pooling cp4031 with buster - [[phab:T242093|T242093]]
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 15:20 vgutierrez: pooling ncredir3001 running buster - [[phab:T243391|T243391]]
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 15:18 marostegui: Restart all instances on db1124 and db1125 to pick up a new replication filter - [[phab:T240094|T240094]]
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 15:11 marostegui: Restart all instances on db2094 and db2095 to pick up a new replication filter - [[phab:T240094|T240094]]
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 14:43 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 40s)
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 14:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop ([[phab:T244578|T244578]])
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 14:40 hoo@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 14:38 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 20s)
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 14:33 vgutierrez: depool and reimage ncredir3001 as buster - [[phab:T243391|T243391]]
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 14:32 vgutierrez: depool & reimage cp4031 as buster - [[phab:T242093|T242093]]
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 14:23 vgutierrez: pooling ncredir3002 running buster - [[phab:T243391|T243391]]
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]
* 13:26 vgutierrez: pooling cp4021 with buster - [[phab:T242093|T242093]]
* 13:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:51 vgutierrez: depool and reimage ncredir3002 as buster - [[phab:T243391|T243391]]
* 12:42 vgutierrez: depool & reimage cp4021 as buster - [[phab:T242093|T242093]]
* 12:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:57 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 vgutierrez: pooling ncredir5001 running buster - [[phab:T243391|T243391]]
* 11:24 vgutierrez: pooling cp4022 with buster - [[phab:T242093|T242093]]
* 11:09 akosiaris: undo wikifeeds experiments
* 11:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:42 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:36 akosiaris: conduct experiments with stopping/starting uwsgi-ores on ores2001 [[phab:T242705|T242705]]
* 10:24 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:23 vgutierrez: depool and reimage ncredir5001 as buster - [[phab:T243391|T243391]]
* 10:14 vgutierrez: depool & reimage cp4022 as buster - [[phab:T242093|T242093]]
* 10:02 akosiaris: increase capacity for wikifeeds by 50% [[phab:T244535|T244535]]
* 10:02 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:53 ema: A:mw: increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 09:09 godog: roll restart cassandra instance on restbase-dev
* 09:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:03 godog: restart cassandra on restbase-dev1004 to test logging pipeline onboard
* 09:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 08:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P10343 and previous config saved to /var/cache/conftool/dbconfig/20200207-085846-marostegui.json
* 08:54 marostegui: Upgrade db1090:3312, db1090:3317
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10342 and previous config saved to /var/cache/conftool/dbconfig/20200207-085432-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10341 and previous config saved to /var/cache/conftool/dbconfig/20200207-084447-marostegui.json
* 08:44 moritzm: installing libexif security updates
* 08:21 akosiaris: deploy https://gerrit.wikimedia.org/r/570726 [[phab:T244535|T244535]] to avoid CPU throttling of wikifeeds
* 08:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Increase base weight for db1126', diff saved to https://phabricator.wikimedia.org/P10340 and previous config saved to /var/cache/conftool/dbconfig/20200207-075323-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10339 and previous config saved to /var/cache/conftool/dbconfig/20200207-075234-marostegui.json
* 07:48 marostegui: Remove revision partitions from db2085:3318 [[phab:T239453|T239453]]
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fullyy repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10338 and previous config saved to /var/cache/conftool/dbconfig/20200207-074511-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10337 and previous config saved to /var/cache/conftool/dbconfig/20200207-074407-marostegui.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10336 and previous config saved to /var/cache/conftool/dbconfig/20200207-074258-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10335 and previous config saved to /var/cache/conftool/dbconfig/20200207-073130-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10334 and previous config saved to /var/cache/conftool/dbconfig/20200207-073026-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10333 and previous config saved to /var/cache/conftool/dbconfig/20200207-063831-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10332 and previous config saved to /var/cache/conftool/dbconfig/20200207-063402-marostegui.json
* 06:31 elukey: force a puppet run on all ores[12] nodes
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10331 and previous config saved to /var/cache/conftool/dbconfig/20200207-062731-marostegui.json
* 06:26 marostegui: Reboot db1107 for update - [[phab:T242702|T242702]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10330 and previous config saved to /var/cache/conftool/dbconfig/20200207-062502-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10329 and previous config saved to /var/cache/conftool/dbconfig/20200207-062345-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10328 and previous config saved to /var/cache/conftool/dbconfig/20200207-062043-marostegui.json
* 04:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:24 robh: eqsin pdu work ongoing starting now.  ps1-603 swapping per [[phab:T242250|T242250]]
* 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-06 ==
== 2021-10-09 ==
* 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244133|T244133]] [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T159711|T159711]] [[phab:T161365|T161365]] [[phab:T164435|T164435]] [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Don't trying to assign  to  if it's unset (duration: 01m 07s)
* 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: [[phab:T242184|T242184]] Change tags method so anon edits will go through (duration: 01m 08s)
* 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is ([[phab:T242606|T242606]])
* 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs [[phab:T233866|T233866]]
* 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:52 akosiaris: restart all wikifeeds pods
* 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:45 akosiaris: restart restbase on restbase1027
* 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 20:30 twentyafterfour: sync-wikiversions --force
* 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Set wgLogoHD before adding wordmark (duration: 01m 06s)
* 19:36 bblack: re-pool cp1075 (eqiad text)
* 19:33 addshore: SWAT done!
* 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: [[phab:T244479|T244479]] Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
* 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
* 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
* 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
* 19:23 cdanis: manual puppet run on netflow1001 looked good; ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "run-puppet-agent --enable 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
* 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]], sync again for luck (duration: 01m 06s)
* 19:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "disable-puppet 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]] (duration: 01m 07s)
* 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 10s)
* 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
* 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
* 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}} (duration: 06m 27s)
* 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}}
* 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:32 herron: set performance cpu scaling governor on maps*
* 16:49 vgutierrez: pooling ncredir5002 running buster - [[phab:T243391|T243391]]
* 16:38 vgutierrez: pooling cp4023 with buster - [[phab:T242093|T242093]]
* 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
* 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
* 16:35 XioNoX: remove AS prepending in esams/knams
* 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
* 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
* 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:07 vgutierrez: depool and reimage ncredir5002 as buster - [[phab:T243391|T243391]]
* 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:03 vgutierrez: depool & reimage cp4023 as buster - [[phab:T242093|T242093]]
* 16:03 vgutierrez: pooling cp4024 with buster - [[phab:T242093|T242093]]
* 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
* 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]] (duration: 04m 35s)
* 15:56 vgutierrez: pooling ncredir4001 running buster - [[phab:T243391|T243391]]
* 15:55 moritzm: installing qemu security updates
* 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]]
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:50 moritzm: installing python-ecdsa security updates
* 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:41 moritzm: installing jsoup security updates
* 15:30 vgutierrez: depool & reimage ncredir4001 as buster - [[phab:T243391|T243391]]
* 15:29 vgutierrez: depool & reimage cp4024 as buster - [[phab:T242093|T242093]]
* 15:28 vgutierrez: pooling ncredir4002 running buster - [[phab:T243391|T243391]]
* 15:27 moritzm: installing sudo security updates on jessie
* 15:23 vgutierrez: pooling cp4025 with buster - [[phab:T242093|T242093]]
* 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 godog: extend graphite1004 / graphite2003 fs +200G
* 14:56 vgutierrez: depool and reimage ncredir4002 as buster - [[phab:T243391|T243391]]
* 14:46 vgutierrez: depool & reimage cp4025 as buster - [[phab:T242093|T242093]]
* 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
* 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump [[phab:T241145|T241145]]
* 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - [[phab:T243963|T243963]]
* 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
* 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
* 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
* 13:31 XioNoX: reboot cr3-knams
* 13:31 elukey: depool mw1347 to test some mcrouter settings
* 13:27 XioNoX: deactivate BGP transits on cr3-knams
* 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - [[phab:T244464|T244464]]
* 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
* 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
* 13:00 addshore: SWAT done
* 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
* 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
* 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]], due to [[phab:T244479|T244479]] (duration: 01m 07s)
* 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 06s)
* 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 07s)
* 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 04s)
* 12:40 vgutierrez: pooling cp3065 - [[phab:T242093|T242093]]
* 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 [[phab:T243395|T243395]] (duration: 01m 07s)
* 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
* 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
* 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
* 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5e1cbb2}}: Enable CX in te, kn, gu, mr and pawiki as a default tool ([[phab:T243271|T243271]], [[phab:T243272|T243272]], [[phab:T243273|T243273]], [[phab:T243274|T243274]], [[phab:T243275|T243275]]) (duration: 01m 09s)
* 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
* 11:38 kart_: Updated cxserver to 2020-02-05-051751-production ([[phab:T244230|T244230]], [[phab:T234323|T234323]])
* 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
* 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
* 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
* 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - [[phab:T244464|T244464]]
* 10:00 vgutierrez: depool and reimage cp3065 as buster - [[phab:T242093|T242093]]
* 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - [[phab:T242093|T242093]]
* 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} (duration: 11m 41s)
* 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}}
* 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet (duration: 00m 29s)
* 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet
* 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
* 08:18 dcausse: restarting blazegraph on wdqs1006: [[phab:T242453|T242453]]
* 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
* 06:46 elukey: run puppet on all ores[12]* nodes
* 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm


== 2020-02-05 ==
== 2021-10-08 ==
* 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}} (duration: 10m 48s)
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}}
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) ([[phab:T244389|T244389]])
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}} (duration: 03m 07s)
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}}
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 20:50 mutante: ores1004 - systemctl start celery-ores-worker
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 07s)
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 20:25 mutante: mw1267 restarting php7.2-fpm
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs [[phab:T233866|T233866]]
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:09 moritzm: installing git security updates for jessie
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:00 moritzm: installing unzip security updates
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 19:44 mutante: LDAP - added spramduya to wmf group ([[phab:T243802|T243802]])
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238029|T238029]] Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 19:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync back revert of {{Gerrit|975b4bbb9}} (duration: 01m 06s)
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 vgutierrez: pooling cp5012 - [[phab:T242093|T242093]]
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 18:23 vgutierrez: rebooting cp5012 - [[phab:T242093|T242093]]
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 18:21 elukey: restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 17:51 mutante: ganeti1017 - rebooting (not in use yet)
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/languages/: [[phab:T244300|T244300]] (duration: 01m 13s)
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:33 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/includes/: [[p