You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99))
imported>Stashbot
(legoktm: uploaded python-logstash to buster-wikimedia for T294393)
 
(306 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-11-09 ==
== 2021-10-26 ==
* 22:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:59 legoktm: uploaded python-logstash to buster-wikimedia for [[phab:T294393|T294393]]
* 22:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 21:14 mbsantos@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]]) (duration: 02m 23s)
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:11 mbsantos@deploy1001: Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]])
* 21:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 20:53 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=maps2002.*
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:36 cdanis: depool maps2002
* 21:04 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: [[phab:T293783|T293783]] (duration: 01m 02s)
* 20:26 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 01m 09s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:25 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 21:03 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/tests/phpunit/includes/api/query/ApiQueryImageInfoTest.php: [[phab:T293783|T293783]] (duration: 01m 02s)
* 20:24 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 11m 36s)
* 21:01 reedy@deploy1002: Synchronized php-1.38.0-wmf.6/includes/api/ApiQueryImageInfo.php: [[phab:T293783|T293783]] (duration: 01m 03s)
* 20:13 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 21:00 reedy@deploy1002: Synchronized php-1.38.0-wmf.5/includes/api/ApiQueryImageInfo.php: [[phab:T293783|T293783]] (duration: 01m 03s)
* 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.16
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:58 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mepps: updated payments-wiki from {{Gerrit|388490e86d}} to {{Gerrit|8612ed1002}}, config revision is {{Gerrit|987e839869}}
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:53 XioNoX: re-order asw-d-codfw interfaces-ranges
* 19:51 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]]
* 17:51 XioNoX: standardize asw-d-codfw interfaces descriptions
* 19:48 eileen: civicrm revision changed from {{Gerrit|733a8fceda}} to {{Gerrit|dba74c443b}}, config revision is {{Gerrit|eed79486d5}}
* 17:33 effie: updating mwdebug2002 to ICU 63 - [[phab:T264991|T264991]]
* 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:57 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 05s)
* 19:38 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]] (duration: 25m 28s)
* 16:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 16:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.6  refs [[phab:T293947|T293947]]
* 16:40 moritzm: imported 2.0.2+0.5.7-1~wmf3+php72+buster1 to component/php72 for buster-wikimedia
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bullseye
* 16:34 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=trwiki; [[phab:T246539|T246539]])
* 17:52 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 01m 34s)
* 16:34 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 17:50 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 16:20 XioNoX: Netbox prod: mass import from PuppetDB (cables, etc) - [[phab:T262899|T262899]]
* 17:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 37s)
* 16:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:06 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 15:55 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS (duration: 1100m 51s)
* 15:12 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62c2e02f836095ba7e8c7b80d97a52aee885b619}}: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis ([[phab:T266298|T266298]]) (duration: 01m 07s)
* 16:25 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:34 hashar: Restarting Gerrit
* 16:25 cdanis@cumin1001: START - Cookbook sre.network.cf
* 14:07 hashar@deploy1001: Finished deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]] (duration: 00m 18s)
* 16:24 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_wikidata_resubmit_changes_for_dispatch
* 14:07 hashar@deploy1001: Started deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]]
* 16:23 mutante: mwmaint1002 - running puppet, created new mw periodic job from gerrit:732972 ([[phab:T294031|T294031]])
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 16:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:03 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 15:45 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:41 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:59 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:38 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 13:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:07 topranks: Running homer against cr3-esams to create new temp GRE tunnel to asw1-b12-drmrs
* 13:40 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:02 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 12:13 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwikinews --fix --add-prefix=BROKEN # [[phab:T266925|T266925]]
* 15:02 cdanis@cumin1001: START - Cookbook sre.network.cf
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11b8f6236d159962bdebccd6dcacb72e600ec6b5}}: Add wgNamespaceAliases for zhwikinews ([[phab:T266925|T266925]]) (duration: 01m 06s)
* 14:55 topranks: Adding static route on cr3-esams to asw1-b12-drmrs Telia link IP to allow GRE to be built.
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87b3eede24fb407ddd226ad65817ab8adf44aeb8}}: Enable DiscussionTools as a beta feature on fiwiki ([[phab:T265446|T265446]]) (duration: 01m 06s)
* 13:50 elukey: ran "Capirca Host Definition" script on netbox - output https://netbox.wikimedia.org/extras/scripts/results/1787315/
* 11:58 moritzm: installing remaining openldap updates on stretch
* 13:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase: {{Gerrit|7723cf724df9ede49129443e43336e93efcd7a41}}: RecentChangeFactory: Add missing rc_logid value ([[phab:T293885|T293885]]) (duration: 01m 02s)
* 11:57 jynus: restart dbstore1004 mariadb instances
* 13:40 elukey: ran "Capirca Host Definition" script on netbox-next to get up-to-date aqs_group host definition - result https://netbox-next.wikimedia.org/extras/scripts/results/894348/
* 10:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:24 kart_: Updated cxserver to 2021-10-25-123807-production ([[phab:T217747|T217747]], [[phab:T218217|T218217]], [[phab:T292421|T292421]])
* 10:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:19 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:36 XioNoX: add 185.15.56.240/29 IPs to relevant cloudsw interfaces - [[phab:T265288|T265288]]
* 13:13 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:35 effie: merging 638109 and roll restart ms-fe* hosts to pick up the change
* 13:05 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:11 XioNoX: renumber cloud-xlink1-eqiad
* 13:05 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.4 (duration: 31m 07s)
* 09:56 Urbanecm: Purge https://vote.wikimedia.org/wiki/Main_Page ([[phab:T262689|T262689]])
* 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=svwiki; [[phab:T246539|T246539]])
* 12:35 hashar: scap clean --delete 1.38.0-wmf.4 # [[phab:T293947|T293947]]
* 09:52 hashar: Restarting Gerrit on gerrit1001 and gerrit2001  in order to have the JVM to exit after OutOfMemory  # [[phab:T267517|T267517]]
* 12:32 hashar: Applied security patches to 1.38.0-wmf.6 # [[phab:T293947|T293947]]
* 09:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b0a81f4294dcedfd5736884900cb561de9a080e}}: Revert "Change votewiki language temporarily to fa for fawiki elections" ([[phab:T262689|T262689]]) (duration: 01m 08s)
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:37 moritzm: installing libexif security updates
* 12:31 hashar: scap prep 1.38.0-wmf.6 # [[phab:T293947|T293947]]
* 09:06 godog: enable thanos query-frontend on thanos-fe hosts - [[phab:T261281|T261281]]
* 12:16 jbond: upload cas_6.4.2-1+wmf10u3_amd64
* 08:24 XioNoX: configure traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 12:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:11 hashar: Restarting Gerrit on gerrit1001 and gerrit2001
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 hashar: Restarted CI Jenkins on contint2001 for Java upgrade
* 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:17 elukey: restart gerrit on gerrit2001 (OOM registered for two days ago, uptime from systemctl since a month ago, probably in a weird state)
* 11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:35 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/tests/phpunit/maintenance/categoryChangesAsRdfTest.php: this was cherry-picked to make CI pass, pushing it out just for a clean staging dir (duration: 01m 06s)
* 11:51 urbanecm@deploy1002: Finished scap: {{Gerrit|c131f32e5e0804c8f5c2ec768b334c81a1b35151}}: Add namespace translations for [ami] Amis and [pwn] Paiwan ([[phab:T292414|T292414]], [[phab:T292415|T292415]]) (duration: 02m 25s)
* 01:32 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 11:49 urbanecm@deploy1002: Started scap: {{Gerrit|c131f32e5e0804c8f5c2ec768b334c81a1b35151}}: Add namespace translations for [ami] Amis and [pwn] Paiwan ([[phab:T292414|T292414]], [[phab:T292415|T292415]])
* 01:30 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 07s)
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:29 tstarling@deploy1001: sync-file aborted: fixing UBN [[phab:T266903|T266903]] (duration: 00m 01s)
* 11:13 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|575a6a66b279c3d2d8974ffcc4911cc5b927be47}}: Fix HD logo size in some wikis ([[phab:T250731|T250731]]; 2/2) (duration: 00m 55s)
* 11:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|575a6a66b279c3d2d8974ffcc4911cc5b927be47}}: Fix HD logo size in some wikis ([[phab:T250731|T250731]]; 1/2) (duration: 00m 57s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:46 jbond: upload cas_6.4.2-1+wmf10u2_amd64.deb
* 10:40 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift
* 10:39 mvernon@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=swift-ro
* 10:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:07 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Switching back graphite to eqiad (duration: 00m 55s)
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:06 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Switching back graphite to eqiad (duration: 01m 04s)
* 09:49 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:49 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:47 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 09:40 godog: flip back write traffic to graphite1004 (all but mediawiki) - [[phab:T247963|T247963]]
* 09:27 godog: move read traffic back to graphite1004 - [[phab:T247963|T247963]]
* 08:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:33 ema: upload varnish_6.0.8-1wm2 to component/varnish6 on apt.wm.org [[phab:T293879|T293879]]
* 08:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/maintenance: {{Gerrit|91316ed5714c4426a29fefded5c4db08dbba48bb}}: Add purgeExpiredMentorStatus.php ([[phab:T280307|T280307]]) (duration: 00m 56s)
* 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 07:07 effie: pool mw1319 and mw1312
* 07:05 effie: pool  wtp1026.eqiad.wmnet
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17606 and previous config saved to /var/cache/conftool/dbconfig/20211026-063647-root.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17605 and previous config saved to /var/cache/conftool/dbconfig/20211026-062144-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17604 and previous config saved to /var/cache/conftool/dbconfig/20211026-060640-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17603 and previous config saved to /var/cache/conftool/dbconfig/20211026-055136-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17602 and previous config saved to /var/cache/conftool/dbconfig/20211026-053633-root.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17601 and previous config saved to /var/cache/conftool/dbconfig/20211026-052129-root.json
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:24 krinkle@deploy1002: Synchronized wmf-config/logging.php: {{Gerrit|I0211e1c77}} (duration: 00m 55s)
* 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-11-08 ==
== 2021-10-25 ==
* 23:08 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 23:12 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Create alias for Appendix and Appendix_talk namespaces on mywiktionary ([[phab:T291146|T291146]]) (duration: 00m 55s)
* 23:06 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 35s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:34 cdanis: repool esams
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:57 ryankemper: [wcqs] Downtimed `wcqs*` until roughly a week from now (while we setup oauth)
* 19:48 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:53 legoktm: uploaded PHP 7.4.25 to apt.wm.o (DSA-4992-1)
* 19:16 cdanis: depool esams
* 22:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@e908052] (wcqs): Deploy 0.3.90 to WCQS
* 18:35 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 03m 04s)
* 18:35 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:27 ryankemper@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 21:53 mutante: new project language "pwn" added - Paiwan is a native language of Taiwan, spoken by the Paiwan, a Taiwanese indigenous people. [[phab:T292415|T292415]]
* 21:52 mutante: new project language "ami" added - Sowal no 'Amis is the Formosan language of the 'Amis (or Ami), an indigenous people living along the east coast of Taiwan. - [[phab:T292414|T292414]]
* 21:50 mutante: log authdns1001 (DNS) - sudo authdns-update, add new project language "ami" (Amis) for [[phab:T292414|T292414]] - edited langlist.tmpl which regenerates all project zones
* 21:40 mutante: authdns1001 (DNS) - sudo authdns-update, add new project language "pwn" (Paiwan) for [[phab:T292415|T292415]]
* 19:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2255.codfw.wmnet with reason: DRAC upgrade
* 19:47 mutante: mw2255 - depooled=inactive (incl "dsh groups"), shut down physically for [[phab:T283582|T283582]] - can be worked on anytime
* 19:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2255.codfw.wmnet
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 19:42 mutante: icinga - ACKing all unhandled CRIT alerts on hosts with "dev" or "test" in their name, regardless of notifications being disabled or not. just so that we get more signal than noise in actual unhandled CRITs in web UI
* 19:40 mutante: cumin2002 - sudo systemctl reset-failed to clear Icinga alert about failed but (now) non-existing service database-backups-snapshots.service, assuming it's a case of "only in active DC"
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 19:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: hardware fail
* 19:07 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily move mw groups to db1123 [[phab:T294295|T294295]]', diff saved to https://phabricator.wikimedia.org/P17597 and previous config saved to /var/cache/conftool/dbconfig/20211025-190717-kormat.json
* 19:06 mutante: db1112 - powercycling
* 19:04 legoktm@cumin1001: dbctl commit (dc=all): 'Depool db1112 ([[phab:T294295|T294295]])', diff saved to https://phabricator.wikimedia.org/P17596 and previous config saved to /var/cache/conftool/dbconfig/20211025-190436-legoktm.json
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:40 jforrester@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/timeline/includes/Timeline.php: Backport: [[gerrit:734312{{!}}Input may be null when rendering a self-closing tag `<timeline />` (T294020)]] (duration: 00m 55s)
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 55s)
* 18:22 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:732971{{!}}Fix some easy codestyle issues]] (duration: 00m 54s)
* 18:19 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732840{{!}}Fix array declaration of NS_USER_TALK abbreviation on ruwikiquote (T197058)]] (duration: 00m 55s)
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:15 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:732836{{!}}flaggedrevs: Drop legacy wgFlaggedRevsStatsAge config, no longer read]] (duration: 00m 55s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732254{{!}}Make reply tool available as opt-out on frwiki (T293687)]] (duration: 00m 56s)
* 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 17:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 17:39 mutante: mw2253 - scap pull after hw maintenance is over
* 17:32 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:24 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:22 XioNoX: update core routers ACLs
* 17:20 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:49 XioNoX: update management routers ACLs
* 16:36 XioNoX: DNS - Add eqsin-ulsfo transport v6 prefix - [[phab:T273308|T273308]]
* 16:31 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:25 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 16:25 mmandere@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:21 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2253.codfw.wmnet
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:734298{{!}}Empty wikibase disabled access entity types on Beta (T294159)]] (beta-only) (duration: 01m 47s)
* 16:04 mmandere@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:01 mmandere@cumin2002: START - Cookbook sre.dns.netbox
* 15:57 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 52s)
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:734328{{!}} Bumping portals to master (T128546)]] (duration: 01m 54s)
* 15:46 jbond: upgrade cas/idp to 6.4.2
* 14:56 mutante: mw2253 - shut down and downtimed for 2 days
* 14:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2253.codfw.wmnet with reason: DRAC upgrade
* 14:49 mutante: depooling mw2253 for DRAC upgrade ([[phab:T283582|T283582]])
* 14:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 14:45 jbond: update cas package
* 14:31 marostegui: Deploy schema change on s3 codfw - [[phab:T291719|T291719]]
* 12:04 ema: cp3062: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 11:57 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm2 [[phab:T293879|T293879]]
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 Lucas_WMDE: UTC morning backport+config window done
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732969{{!}}Remove dispatchLagToMaxLagFactor Wikibase setting (T292604)]] (duration: 00m 54s)
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732951{{!}}Remove wikibaseDispatchRedisLockManager config (T292604)]] (duration: 00m 54s)
* 11:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732950{{!}}Remove wmg variables for dispatchChanges.php Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732949{{!}}Remove dispatchChanges.php-related Wikibase settings (T292604)]] (duration: 00m 55s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:732372{{!}}Remove dispatchViaJobs-related Wikibase settings (T291828)]] (duration: 00m 56s)
* 09:52 godog: bounce uwsgi graphite web on graphite2003 - [[phab:T294220|T294220]]
* 09:52 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:48 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:733089{{!}}[BETA CLUSTER] Enable WikibaseLexeme Scribunto access (T294159)]] (merged on Friday, syncing now to avoid outdated files even if it’s just -labs.php) (duration: 00m 55s)
* 09:18 godog: bounce graphite-web on graphite2003 to test timeout bump - [[phab:T294220|T294220]]
* 08:08 XioNoX: merge DNS changes to add drmrs
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:50 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 05:47 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=wtp1026.*
* 05:43 _joe_: pooling wtp1042 [[phab:T294212|T294212]]
* 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1109.eqiad.wmnet with OS buster
* 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1109.eqiad.wmnet with OS buster
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 (s8) for reimage [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17590 and previous config saved to /var/cache/conftool/dbconfig/20211025-043028-marostegui.json


== 2020-11-06 ==
== 2021-10-23 ==
* 23:38 dwisehaupt: frdata1001 upgraded to buster
* 16:40 dcausse: restarting blazegraph on wdqs1004 and wdqs1006 (free allocators alert)
* 22:40 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s)
* 15:45 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]]), testing whether [[phab:T291137|T291137]] is still an issue
* 22:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling
* 22:29 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s)
* 22:29 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling
* 20:57 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 05s)
* 20:56 reedy@deploy1001: Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 10s)
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 cwhite@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 17:02 dwisehaupt: rolled out new thank_you_mail_send process_control scripts to utilize frmx hosts
* 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet
* 14:46 moritzm: installing wireshark security updates
* 14:36 hnowlan: resyncing database on maps1001
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:05 hnowlan: started cassandra bootstrap of maps2005
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 hnowlan: joining maps2005 to cassandra cluster
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 moritzm: uploaded openjdk-8  8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk
* 10:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:06 dcausse: restarted elastic on elastic1063 ([[phab:T265113|T265113]])
* 09:57 moritzm: installing spice security updates
* 09:32 moritzm: installing libsndfile security updates
* 09:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 moritzm: installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed)
* 04:38 ryankemper: [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930
* 04:36 ryankemper: Finished restarting wdqs categories one host at a time across all wdqs production instances
* 04:02 ryankemper: Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress)
* 04:01 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:01 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:00 ryankemper: `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps
* 03:59 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s)
* 03:51 ryankemper: Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet
* 03:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@27a5c54]: 0.3.54
* 03:48 ryankemper: About to begin wdqs deploy, tests passing on canary `wdqs1003`
* 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]]) (duration: 69m 02s)


== 2020-11-05 ==
== 2021-10-22 ==
* 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]])
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370)]] (duration: 07m 22s)
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370)]] (duration: 01m 08s)
* 20:57 bblack: re-pooling eqiad in DNS
* 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: [[gerrit:639504{{!}}Bump wikimedia/parsoid to 0.13.0-a16 (T267146)]] (duration: 01m 14s)
* 20:54 legoktm: <XioNoX> I disabled the interface on cr1, going to re-enabled the active on on cr2
* 20:54 hnowlan: reenabled tilerator in eqiad
* 20:48 legoktm: bblack has temporarily depooled eqiad https://gerrit.wikimedia.org/r/733043
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
* 20:41 XioNoX: disable sessions to equinix eqiad IXP
* 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
* 19:17 urbanecm: Start server-side upload of 1 video file ([[phab:T294134|T294134]])
* 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 15:06 jbond: upload puppetboard_3.1.0-1_all.deb to ullseye-wikimedia
* 20:39 hnowlan: finished removenode of maps2002 cassandra
* 13:42 ema: deployment-cache-upload06: restart varnish-frontend, package got upgraded to 6.0.8 [[phab:T294116|T294116]]
* 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
* 13:30 jbond: upload python3-pypuppetdb_2.4.0-1_all.deb to bullseye
* 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 10:46 jbond: upload cas_6.4.2-1+wmf10u1
* 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: [[gerrit:639500{{!}}Dont double-format numeric edit count (T267362)]] (duration: 01m 06s)
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 19:44 Urbanecm: Morning B&C window done
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: {{Gerrit|81cb1c7b141d49d7fc931fdc13ffd1b48b3a25ab}}: Suggested edits: Export task count from start editing dialog ([[phab:T266868|T266868]]; [[phab:T263040|T263040]]) (duration: 01m 07s)
* 09:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294029|T294029]]
* 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|453b9c64c44a256eafdfafe7a0023484377bbbd2}}: Fix DiscussionTools wikis config for thwiki/tgwiki ([[phab:T266303|T266303]]) (duration: 01m 08s)
* 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS buster
* 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
* 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached [[phab:T263910|T263910]]
* 08:27 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
* 08:24 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
* 08:23 ema: cp3062: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 08:00 ema: deployment-cache-text06: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 17:41 brennen: train is currently unblocked; rolling to group0 ([[phab:T263182|T263182]])
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17580 and previous config saved to /var/cache/conftool/dbconfig/20211022-055403-root.json
* 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17579 and previous config saved to /var/cache/conftool/dbconfig/20211022-053900-root.json
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17578 and previous config saved to /var/cache/conftool/dbconfig/20211022-052356-root.json
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17577 and previous config saved to /var/cache/conftool/dbconfig/20211022-050852-root.json
* 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17576 and previous config saved to /var/cache/conftool/dbconfig/20211022-045349-root.json
* 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: [[gerrit:639491{{!}}language: Clean up $separatorTransformTable in km/la/my (T267091)]] (duration: 01m 12s)
* 04:46 marostegui_: Deploy schema change on s8 codfw - [[phab:T291719|T291719]]
* 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: [[gerrit:639495{{!}}mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311)]] (duration: 01m 10s)
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17575 and previous config saved to /var/cache/conftool/dbconfig/20211022-043845-root.json
* 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
* 02:59 ejegg: updated payments-wiki from {{Gerrit|088a8cda1e}} to {{Gerrit|6e810fb401}}
* 17:14 hnowlan: rebuilding cassandra on maps2002
* 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
* 17:05 hnowlan: restarting maps2004 postgres for config change
* 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
* 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:41 moritzm: installing junit4 security updates
* 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
* 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
* 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 jbond42: disable puppet fleet wide to restart puppetdb
* 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:52 jbond42: upgrade freetype on jessie
* 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
* 12:09 marostegui: Upgrade mysql on pc2010
* 11:58 jynus: shutting down db1139 in preparation of maintenance [[phab:T261405|T261405]]
* 11:55 marostegui: Upgrade mysql on db1077
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
* 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; [[phab:T246539|T246539]])
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - [[phab:T262512|T262512]]
* 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
* 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
* 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T267216|T267216]]', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
* 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
* 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
* 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
* 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
* 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]


== 2020-11-04 ==
== 2021-10-21 ==
* 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee0ba541fa55f6707276fdc5bd3f032cb9be3e60}}: Disable the search in header A/B test ([[phab:T265333|T265333]]) (duration: 01m 06s)
* 23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 20:33 ejegg: updated payments-wiki from {{Gerrit|1ad4ba9639}} to {{Gerrit|388490e86d}}
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 01m 07s)
* 23:37 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|82579bf9d71bd3c9d97da0132ce8d92a8863da5b}}: Enable wgImagePreconnect on remaining wikis ([[phab:T123582|T123582]]) (duration: 01m 06s)
* 23:36 jforrester@deploy1002: Synchronized multiversion/: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2a57725f8f6fdaa3f40c834e84b43a0260077f2}}: Enable DiscussionTools as a beta feature on almost all Wikipedias ([[phab:T266303|T266303]]) (duration: 01m 07s)
* 23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fb5c03262c20b5e99b3c2f6e91abb024f12da1f5}}: Enable wgCheckUserLogLogins at all wikis but loginwiki ([[phab:T253802|T253802]]) (duration: 01m 08s)
* 23:33 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
* 23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:51 Urbanecm: Strip 2FA for Mark83 at SUL ([[phab:T267257|T267257]])
* 23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:730946{{!}}CommonSettings: Drop legacy CentralAuth config flag, never read (T277932)]] (duration: 00m 55s)
* 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
* 23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
* 23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 22:42 mutante: [[phab:T294038|T294038]] [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created.  . .Successfully sent email
* 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
* 21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
* 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
* 21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 17:07 effie: Reimage mc1036 for real this time
* 20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
* 16:40 brennen: 1.36.0-wmf.16 was branched at {{Gerrit|f51ccd2ccef8cba0e7d874b6f7cf4b73bcd36636}} for [[phab:T263182|T263182]]
* 20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
* 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 15:39 effie: Reimage mc1036 to buster - [[phab:T252391|T252391]]
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - [[phab:T259163|T259163]] (duration: 00m 58s)
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 00m 59s)
* 19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container ([[phab:T293050|T293050]]) (duration: 00m 55s)
* 14:37 jynus: restart mysql at db1133 [[phab:T266483|T266483]]
* 19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
* 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
* 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
* 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:53 urbanecm: Deploy security patch for [[phab:T285116|T285116]] (wmf.4, wmf.5)
* 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
* 18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
* 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 [[phab:T266483|T266483]]
* 17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
* 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on [[phab:T294010|T294010]] (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
* 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:59 jynus: restart mysqls at db1150 [[phab:T266483|T266483]]
* 17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:54 jynus: restart mysqls at db1145 [[phab:T266483|T266483]]
* 17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:51 jynus: restart mysqls at db1140 [[phab:T266483|T266483]]
* 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:47 jynus: restart mysqls at db1139 [[phab:T266483|T266483]]
* 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 jynus: restart mysqls at db1116 [[phab:T266483|T266483]]
* 16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 13:40 jynus: restart mysqls at db1102 [[phab:T266483|T266483]]
* 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (2/3) (duration: 00m 54s)
* 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 13:35 jynus: restart mysqls at db1095 [[phab:T266483|T266483]]
* 16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (2/3) (duration: 00m 55s)
* 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (1/3) (duration: 00m 57s)
* 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 54s)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 Lucas_WMDE: my next message accidentally says 1/3 again but it’s 2/3, sorry
* 12:50 Lucas_WMDE: EU backport&config done
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 12:11 Urbanecm: Run scap pull at snapshot1010 manually
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
* 14:56 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ed3c43dc4488205663e6694b7ddfa991e3f3d4b9}}: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267193|T267193]]) (duration: 01m 02s)
* 14:42 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/Wikibase.default.php: Backport: [[gerrit:732666{{!}}Enable dispatching via jobs by default (T291828)]] (duration: 00m 55s)
* 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client/: Backport: [[gerrit:732674{{!}}Fix ExternalUserNames service wiring for local database]] (duration: 00m 57s)
* 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:56 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:34 volans: uploaded spicerack_1.0.6 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; [[phab:T246539|T246539]])
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
* 11:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 Lucas_WMDE: UTC morning backport+config window done
* 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
* 11:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294008|T294008]]
* 10:05 _joe_: restarting envoyproxy on restbase20<nowiki>{</nowiki>09,10<nowiki>}</nowiki> to test poolcounter usage by the safe restart scripts
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:07 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730848{{!}}Configure event stream for map tiles state change (T289771)]] (duration: 01m 04s)
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:47 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:14 jbond: mergeing refactor of P:base Gerrit:714975
* 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:56 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 03s)
* 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 08:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
* 08:26 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
* 08:25 ema: cp3062: revert vsl_space experiment [[phab:T293879|T293879]]
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
* 08:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1004.eqiad.wmnet with OS bullseye
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17563 and previous config saved to /var/cache/conftool/dbconfig/20211021-080330-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
* 07:56 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1004.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17562 and previous config saved to /var/cache/conftool/dbconfig/20211021-074826-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17561 and previous config saved to /var/cache/conftool/dbconfig/20211021-073323-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17560 and previous config saved to /var/cache/conftool/dbconfig/20211021-071819-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17559 and previous config saved to /var/cache/conftool/dbconfig/20211021-070315-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17558 and previous config saved to /var/cache/conftool/dbconfig/20211021-064812-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
* 06:35 elukey: `systemctl reload nginx` on cloudelastic100[5,6] to pick up the new TLS certificate and clear alerts - [[phab:T293826|T293826]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
* 04:47 marostegui: Deploy schema change on s5 codfw - [[phab:T291719|T291719]]
* 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
* 04:37 marostegui: Deploy schema change on s6 codfw - [[phab:T291719|T291719]]
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
* 04:04 legoktm: restarted apache on lists1001 so it only uses new TLS cert ([[phab:T293826|T293826]])
* 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 [[phab:T261717|T261717]]
* 03:29 eileen: civicrm revision changed from {{Gerrit|e889831012}} to {{Gerrit|733a8fceda}}, config revision is {{Gerrit|eed79486d5}}
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
* 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
* 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - [[phab:T253438|T253438]]
* 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json


== 2020-11-03 ==
== 2021-10-20 ==
* 22:56 _joe_: repooling mw1346
* 23:56 thcipriani@deploy1002: Finished scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]] (duration: 11m 41s)
* 22:55 _joe_: depooling mw1346
* 23:44 thcipriani@deploy1002: Started scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]]
* 22:49 cdanis: mw1342 restart-php7.2-fpm
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 cdanis: repool mw1278 and mw1279
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace [[phab:T291018|T291018]] (duration: 01m 02s)
* 22:31 cdanis: depool mw1276 and mw1279 also
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
* 23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace [[phab:T291018|T291018]] (duration: 01m 04s)
* 21:16 hashar: Gerrit: triggering java garbage collection # [[phab:T263008|T263008]]
* 22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
* 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
* 22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
* 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
* 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM [[phab:T265113|T265113]]
* 21:50 dancy: Testing a series of one-file scap sync-file runs
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9cf996a38d82fdd67e600a5a951e88423957e8d}}: Promote Growth features out of darkmode on several wikis  ([[phab:T291826|T291826]], [[phab:T255037|T255037]], [[phab:T287878|T287878]]) (duration: 01m 04s)
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:38 eileen: civicrm revision changed from {{Gerrit|9b5e0d015b}} to {{Gerrit|e889831012}}, config revision is {{Gerrit|eed79486d5}}
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:25 legoktm: uploaded php7.4 on buster to apt.wm.o ([[phab:T293449|T293449]])
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
* 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 mutante: disabling puppet on mw* for a minute to deploy a change
* 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org -  [[phab:T293810|T293810]]
* 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
* 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
* 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - [[phab:T293860|T293860]] (duration: 01m 03s)
* 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - [[phab:T293895|T293895]] (duration: 01m 03s)
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - [[phab:T293894|T293894]] (duration: 01m 09s)
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
* 16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:13 jbond: upload cas_6.4.2-1_amd64.deb
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
* 15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
* 14:57 moritzm: installing modsecurity-crs security updates on Buster
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 moritzm: installing xmlgraphics-commons security updates on Buster
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:46 moritzm: installing irssi security updates on Buster
* 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 moritzm: installing commons-io security updates on Buster
* 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 ema: cp3062: test higher vsl_space values [[phab:T293879|T293879]]
* 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
* 14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:43 sobanski: Removing db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 14:12 moritzm: installing ruby2.3 security updates
* 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:40 moritzm: installing apache2 security updates on buster
* 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
* 13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 01m 02s)
* 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
* 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
* 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M [[phab:T293879|T293879]] - varnish restart needed
* 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
* 12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 03s)
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:02 urbanecm@deploy1002: Finished scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]]) (duration: 25m 19s)
* 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
* 11:23 hnowlan: resyncing postgres replica maps1001
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
* 11:03 Amir1: rolling restart of ores
* 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:37 urbanecm@deploy1002: Started scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]])
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
* 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 07s)
* 11:21 moritzm: installing ffmpeg security updates
* 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e520fc57411bb19123766192cd636396ea6fc59d}}: GrowthExperiments: Add campaign pattern for enwiki ([[phab:T293699|T293699]]) (duration: 01m 22s)
* 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 26s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
* 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
* 10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
* 09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
* 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
* 06:35 marostegui: Upgrade db1106
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 06:31 dcausse: restarting blazegraph on wdqs1012
* 08:32 godog: Prometheus re-enable compactions - [[phab:T261281|T261281]]
* 06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
* 06:59 marostegui: Remove db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 06:21 marostegui: Depool clouddb1013 for upgrade
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl [[phab:T267088|T267088]]', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
* 06:46 marostegui: Deploy schema change on s1 codfw master: [[phab:T265349|T265349]]
* 06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:16 marostegui: Stop MySQL on es1014 to clone es1028 [[phab:T261717|T261717]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
* 06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
* 06:05 XioNoX: put transport link between ulsfo and eqsin in service - [[phab:T273308|T273308]]
* 06:11 marostegui: Stop MySQL on es1012 to clone es1027 [[phab:T261717|T261717]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
* 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:04 marostegui: Stop MySQL on es1011 to clone es1026 [[phab:T261717|T261717]]
* 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
* 04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis ([[phab:T288848|T288848]]) (duration: 01m 05s)
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
* 01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:39 cstone: civicrm revision changed from {{Gerrit|cd13d9e30f}} to {{Gerrit|b1342c4129}}
* 01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:13 shdubsh: restart ES on logstash1009 - oom killed
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 00:00 tgr: west coast evening deploys done
* 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-11-02 ==
== 2021-10-19 ==
* 22:19 twentyafterfour: restart php7.3-fpm on phab1001
* 23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103{{!}}Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s)
* 22:03 twentyafterfour: applied {{Gerrit|113a244a66}} on phab1001 to hotfix [[phab:T240862|T240862]]
* 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:22 eileen: process-control config revision is {{Gerrit|313a36312f}} re-enable thank you
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053{{!}}ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s)
* 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 eileen: civicrm revision changed from {{Gerrit|3317d30356}} to {{Gerrit|cd13d9e30f}}, config revision is {{Gerrit|db912e3bba}}
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 eileen: process-control config revision is {{Gerrit|db912e3bba}} - thankyou job off for testing
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565{{!}}Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s)
* 19:07 Urbanecm: Deployed security fix for [[phab:T205908|T205908]]
* 23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953{{!}}Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s)
* 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 mutante: decom'ing testvm1001
* 23:13 tgr@deploy1002: Synchronized static: Config: [[gerrit:731231{{!}}Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s)
* 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete {{!}} fixed Icinga alert: RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: [[phab:T165885|T165885]]
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:56 ejegg: updated payments-wiki from {{Gerrit|0f48acea49}} to {{Gerrit|30e596903d}}
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: {{Gerrit|a84a675}}: {{Gerrit|3231578}}: MediaSearch backports ([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: {{Gerrit|694580a}}: {{Gerrit|c02e301}}: MediaSearch backports([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:14 XioNoX: push new pfw policies - [[phab:T267051|T267051]]
* 18:30 foks: deleting 1 more email with deleteUserEmail.php
* 16:39 ejegg: updated payments-wiki from {{Gerrit|adc3369cb3}} to {{Gerrit|1ad4ba9639}}
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1476a2d93}}: {{Gerrit|dd8393c1a0}}: foundationwiki: Restrict sensitive namespaces to editor group ([[phab:T205350|T205350]]) (duration: 01m 03s)
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a2893c7190e615a247674dbf7f87348bf43b91c}}: Enable topic subscriptions as a beta feature on all remaining projects ([[phab:T287802|T287802]]) (duration: 01m 04s)
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (2/2) (duration: 01m 06s)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (1/2) (duration: 01m 05s)
* 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
* 17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
* 16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - [[phab:T266024|T266024]] (duration: 00m 58s)
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
* 16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 13:03 Lucas_WMDE: EU backport&config window done
* 16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: [[gerrit:637801{{!}}Revert JS parser commits (T266671)]] (duration: 01m 09s)
* 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637819{{!}}Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917)]] (duration: 00m 58s)
* 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 2/2 (Beta) (duration: 00m 57s)
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 1/2 (production) (duration: 01m 02s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:638020{{!}}Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]] (duration: 00m 58s)
* 15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
* 15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]], Beta part (prod no-op) (duration: 00m 58s)
* 15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - [[phab:T277193|T277193]] (duration: 01m 04s)
* 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]] (duration: 00m 59s)
* 15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
* 15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 11:51 effie: disable puppet on thumbor1001 and thumbor1002 to test 636024
* 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
* 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - [[phab:T261281|T261281]]
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 45m 13s)
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 moritzm: installing openldap security updates on corp LDAP replicas
* 13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:46 XioNoX: add uRPF strict to ulsfo office links - [[phab:T266561|T266561]]
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:41 moritzm: installing openldap security updates on LDAP replicas
* 13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - [[phab:T261281|T261281]]
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
* 12:40 moritzm: installing aftpd security updates
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
* 12:34 marostegui: Upgrade dbstore1003
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
* 12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - [[phab:T288843|T288843]]
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
* 12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: {{Gerrit|ec0125770775c1a1a54c3b592d86d287fd9e3ad6}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 55s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
* 12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: {{Gerrit|79808a90a95dd5dac2b532b87fb7ec1a490ea0f0}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 56s)
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - [[phab:T288843|T288843]]
* 11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
* 11:46 marostegui: Upgrade db1105 (s1,s2)
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c31b04e50101a60db7ae8acae64bc031f5e1007}}: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
* 10:56 marostegui: Upgrade clouddb1021
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 10:51 moritzm: failover master in ganeti-test to ganeti2026
* 10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
* 10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
* 10:38 oblivian@deploy1002: Synchronized w/static.php: Config: [[gerrit:730182{{!}}static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:37 marostegui: Upgrade db1101 (s7,s8)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - [[phab:T247963|T247963]]
* 10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
* 10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
* 09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
* 09:37 godog: move graphite/statsd writes to graphite2003 - [[phab:T247963|T247963]]
* 09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3  # [[phab:T281169|T281169]]
* 09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # [[phab:T281169|T281169]]
* 09:19 marostegui: Stop slave on db2112 [[phab:T290865|T290865]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:03 XioNoX: push anycast tuning to all Telia transit links - [[phab:T288843|T288843]]
* 08:50 godog: point graphite.discovery.wmnet to graphite2003 - [[phab:T247963|T247963]]
* 08:40 XioNoX: push prep-work for anycast tuning to all sites - [[phab:T288843|T288843]]
* 08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
* 08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
* 08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - [[phab:T288843|T288843]]
* 08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
* 06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
* 06:06 marostegui: Upgrade dbstore1005
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
* 06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:03 marostegui: Upgrade db1184, db1178
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 05:46 marostegui: Reimage db2112 (s1 codfw master) [[phab:T290865|T290865]]
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2020-11-01 ==
== 2021-10-18 ==
* 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # [[phab:T266976|T266976]]
* 23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b654980240d51fff3c6e9c48f7076d4609c2560f}}: Create an alias for the Draft namespace on hrwiki ([[phab:T291755|T291755]]) (duration: 00m 56s)
* 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
* 23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # [[phab:T291761|T291761]]
* 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abe777d28594da852e49ccb1c1597b2598f3e483}}: Create Rhymes namespace for thwiktionary ([[phab:T291761|T291761]]) (duration: 00m 57s)
* 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json
* 23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests ([[phab:T288848|T288848]]) (duration: 00m 56s)
* 22:06 maryum: deployed security patch for [[phab:T293589|T293589]]
* 21:23 maryum: deployed security patch for [[phab:T293556|T293556]]
* 21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki {{!}} Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
* 20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots ([[phab:T160122|T160122]])
* 19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: {{Gerrit|ac7b4fc2ccc69589e00a42f49d18a8f6d71777f2}}: Revert 727328 ([[phab:T293554|T293554]]) (duration: 00m 56s)
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - [[phab:T277193|T277193]] (duration: 00m 57s)
* 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group ([[phab:T293621|T293621]])
* 17:51 mutante: puppet run on all bastion hosts via cumin
* 15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
* 15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia [[phab:T292196|T292196]]
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 14:54 herron: rebuilt and uploaded kafkatee for bullseye [[phab:T292196|T292196]]
* 14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731346{{!}}[beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361)]] (duration: 00m 56s)
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (2/2) (duration: 00m 56s)
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (1/2) (duration: 00m 56s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:731014{{!}}Unconditionally enable Wikibase dispatching via jobs (T291828)]] (duration: 00m 56s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
* 12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:55 Lucas_WMDE: UTC morning backport window done
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (2/2) (duration: 00m 56s)
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (1/2) (duration: 00m 56s)
* 11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:49 marostegui: Reimage db2079 (codfw s8 master) [[phab:T290868|T290868]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730747{{!}}Set dispatchViaJobsAllowedClients to null everywhere (T291828)]] (duration: 00m 56s)
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731239{{!}}Make deduplication actually work for DispatchChangesJob (T291118)]] (duration: 00m 55s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (2/2) (duration: 00m 56s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (duration: 00m 56s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
* 10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:731237{{!}}Don't filter by change Id when dispatching to client wikis ()]] (duration: 00m 59s)
* 09:48 moritzm: installing node-tar security updates on buster
* 09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - [[phab:T292619|T292619]]
* 09:38 godog: sync metrics from graphite1004 to graphite2003 - [[phab:T247963|T247963]]
* 09:13 moritzm: installing apr security updates on bullseye
* 08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
* 07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 07:34 elukey: depool + restart blazegraph on wdqs1013
* 07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-10-31 ==
== 2021-10-16 ==
* 00:12 mutante: removed Nuria from wmf group, she is already in nda group ([[phab:T266086|T266086]])
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)


== 2020-10-30 ==
== 2021-10-15 ==
* 23:35 foks: removing two files for legal compliance
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet [[phab:T266702|T266702]]
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:34 mutante: apt2001 - upgraded nginx
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - [[phab:T266164|T266164]]
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 20:56 mutante: mw1267,mw1268 - scap pull
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]] (duration: 05m 14s)
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]]'
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 17:19 effie: disable puppet on mc1036 and mc2036 - [[phab:T252391|T252391]]
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:18 effie: enable puppet on all mediawiki and mc* hosts
* 06:20 urbanecm: Start server-side upload for 1 video file
* 16:19 elukey: kafka-jumbo1006 still running with 1g nick
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 15:36 effie: stopping puppet on mediawiki and mc* hosts
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:07 brennen: end of UTC late backport & config training window
* 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 rzl: downtiming mc2036 for buster reimage
* 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
* 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad [[phab:T266164|T266164]]
* 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
* 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:54 elukey: decom an-tool1006 (old analytics test vm) - [[phab:T255139|T255139]]
* 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-10-29 ==
== 2021-10-14 ==
* 23:59 eileen: process-control config revision is {{Gerrit|6891d35bce}}
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:39 Urbanecm: Evening B&C window done
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # [[phab:T266605|T266605]] # P13112
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddb7e08e9c1d07f704c9f7585d8b6089f1895b5c}}: Add namespace aliases to Turkish Wikiquote ([[phab:T266605|T266605]]) (duration: 00m 57s)
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:36 eileen: process-control config revision is {{Gerrit|1114512f90}}
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # [[phab:T266606|T266606]] # P13111
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3a8555154673c4c5a65f6ec2a1219d0832f48e0}}: Add namespace aliases to Turkish Wikisource ([[phab:T266606|T266606]]) (duration: 00m 56s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # [[phab:T266608|T266608]]
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1800d11ec8c07ff6ccffe0fd03ce11e6786f8a6e}}: Add namespace aliases to Turkish Wikibooks ([[phab:T266608|T266608]]) (duration: 00m 57s)
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 23:22 eileen: civicrm revision changed from {{Gerrit|e1d65b0f3a}} to {{Gerrit|3317d30356}}, config revision is {{Gerrit|d70fe02cb9}}
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix    # [[phab:T266609|T266609]]
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|090f75730727e7a3ca5a85af0ff9071213dd047f}}: Add namespace aliases to Turkish Wiktionary ([[phab:T266609|T266609]]) (duration: 00m 58s)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 22:35 mutante: mw1268 - depooled for [[phab:T266164|T266164]]
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically ([[phab:T266164|T266164]])
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install [[phab:T263284|T263284]]
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:31 mutante: depooling mw1452 for testig
* 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 22:06 mutante: depooled mw1267 ([[phab:T266164|T266164]])
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:41 urbanecm: UTC evening B&C done
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; [[phab:T246539|T246539]])
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 19:13 Amir1: rolling restart of ores uwsgi
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote ([[phab:T266744|T266744]]) (duration: 00m 57s)
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 17:42 rzl: depool mw1452 for training
* 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # [[phab:T266744|T266744]]
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b7eaaab81e1665c478f5dc1fdb495e36c53e7863}}: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální ([[phab:T245639|T245639]]) (duration: 00m 57s)
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:29 hashar: Restarted CI Jenkins a bit ago
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 17:15 hashar: CI: killed all java  agents (java upgrade)
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:12 hashar: Stopping CI Jenkins
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - [[phab:T265288|T265288]]
* 16:33 moritzm: installing node-ansi-regex security updates
* 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - [[phab:T265288|T265288]]
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 16:34 XioNoX: force VRRP master on cr1-eqiad - [[phab:T265288|T265288]]
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:22 moritzm: installing bacula updates from Buster point release
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: {{Gerrit|483c3bceb926ac6a2cfc40112fb9b4f0671fef72}}: Attempt to add a query cache to DPL ([[phab:T263220|T263220]]) (duration: 00m 58s)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 15:16 papaul: poweroff mc2029 for relocation
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|19c5aff02c20812c56b8abdcc0ed530393010193}}: Set wgDLPQueryCacheTime to 120 at all wikis ([[phab:T263220|T263220]]) (duration: 00m 59s)
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - [[phab:T265911|T265911]]
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 14:59 papaul: poweroff sessionstore2002 for relocation
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 14:23 moritzm: installing krb5 security updates on KDCs
* 14:08 godog: bump FS for prometheus codfw global instance
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:54 elukey: roll out profile::java on all zookeeper instances
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 13:53 moritzm: installing Java 11 security updates
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 13:52 bblack: authdns1001 - restart gdnsd - [[phab:T266746|T266746]]
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 13:46 bblack: authdns2001 - restart gdnsd - [[phab:T266746|T266746]]
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:25 Urbanecm: Correction: Obviously 1002 ([[phab:T246539|T246539]])
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; [[phab:T246539|T246539]])
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 13:21 moritzm: installing bluez security updates on stretch
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 12:56 marostegui: Make orchestrator discover pc2 [[phab:T266485|T266485]]
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:55 marostegui: Deploy orchestrator grants on pc2 [[phab:T266485|T266485]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:44 marostegui: Deploy grants for cluster alias on pc1 [[phab:T266485|T266485]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:35 moritzm: restart idp-test
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 11:14 Urbanecm: EU B&C window done
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|28152b7387082b79d71cfbf28be740ffe629ee50}}: Add another SDC property to search for matching media statements ([[phab:T264925|T264925]]) (duration: 00m 58s)
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - [[phab:T266746|T266746]]
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) [[phab:T264109|T264109]]
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - [[phab:T258405|T258405]]
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 [[phab:T266663|T266663]]
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 [[phab:T266663|T266663]]
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:50 vgutierrez: restart haproxy on authdns2001
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 [[phab:T266663|T266663]]
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 [[phab:T266663|T266663]]
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 [[phab:T266663|T266663]]
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 [[phab:T266663|T266663]]
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 [[phab:T266663|T266663]]
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 [[phab:T266663|T266663]]
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 [[phab:T266663|T266663]]
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 [[phab:T266663|T266663]]
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 [[phab:T266663|T266663]]
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore ([[phab:T257906|T257906]])
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:17 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
* 00:14 Amir1: rolling restart of ores
* 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
* 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)


== 2020-10-28 ==
== 2021-10-13 ==
* 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 23:52 ryankemper@deploy1001: deploy aborted:  0.3.53 (duration: 00m 00s)
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:54 mutante: scandium - scap pull after reinstalling OS
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:47 foks: removing 8 files for legal compliance
* 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
* 21:03 foks: removing 2 files for legal compliance
* 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:56 jgleeson: updated Smashpig from {{Gerrit|2246685626}} to {{Gerrit|09f29c1da5}}
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 18:56 tgr_: Morning deploys done
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983{{!}}Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791{{!}}Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956{{!}}Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787{{!}}Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880{{!}}Removing obsolete license definition]] (duration: 01m 00s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 17:30 hnowlan: reimporting OSM data for eqiad
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 17:24 hnowlan: removing OSM database on maps1004
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:16 hnowlan: Disabling tilerator in eqiad
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:51 Amir1: restarting uwsgi on ores in eqiad
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:10 godog: roll restart logstash5 in codfw
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 12:39 moritzm: installing libdatetime-timezone-perl  updates
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - [[phab:T266561|T266561]]
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:39 ema: due to [[phab:T266651|T266651]], cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - [[phab:T266648|T266648]]
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 moritzm: reverted to clean package state on deneb
* 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:37 jynus: updated dump grants on db2093
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - [[phab:T257905|T257905]]
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 04:43 ryankemper: [[phab:T266492|T266492]] Finished rolling restart of codfw cirrus cluster
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 02:58 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 02:12 eileen: tools revision changed from {{Gerrit|a2a91d6c6a}} to {{Gerrit|087a596d3a}}
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 00:40 eileen: civicrm revision changed from {{Gerrit|4fdfb8408b}} to {{Gerrit|e1d65b0f3a}}, config revision is {{Gerrit|f16003ab62}}
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2020-10-27 ==
== 2021-10-12 ==
* 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
* 23:16 urbanecm: UTC late B&C window done
* 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job ([[phab:T266024|T266024]])
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:40 eileen: civicrm revision changed from {{Gerrit|bb7c08bf6d}} to {{Gerrit|4fdfb8408b}}, config revision is {{Gerrit|f16003ab62}}
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 17:18 ejegg: updated payments-wiki from {{Gerrit|4c1503ad91}} to {{Gerrit|adc3369cb3}}
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:42 mepps: updated payments-wiki-staging from {{Gerrit|5fdd29bc16}} to {{Gerrit|4c1503ad91}}
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:25 ema: cp4032: downgrade varnish to 6.0.4 [[phab:T264398|T264398]]
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 [[phab:T266567|T266567]]
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 [[phab:T266567|T266567]]
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 14:27 _joe_: restart php-fpm on jobrunners in codfw
* 17:12 moritzm: installing rsync bugfix updates
* 14:17 cdanis: ran puppet on alert1001
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:34 urbanecm: UTC morning B&C window done
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago [[phab:T261487|T261487]]
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - [[phab:T265589|T265589]]
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - [[phab:T265589|T265589]]
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - [[phab:T265589|T265589]]
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - [[phab:T265589|T265589]]
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - [[phab:T265589|T265589]]
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 07:22 moritzm: installing RT security updates
* 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 06:42 ryankemper: [[phab:T263970|T263970]] Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2020-10-26 ==
== 2021-10-11 ==
* 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set ([[phab:T266501|T266501]]) (duration: 01m 00s)
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 22:30 mutante: netflow5001 - systemctl reset-failed
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 12:53 moritzm: install apache security updates on buster
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 12:04 moritzm: install apache security updates on bullseye
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]
* 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
* 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
* 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]]) (duration: 06m 53s)
* 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]])
* 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) ([[phab:T265556|T265556]]) (duration: 00m 57s)
* 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki ([[phab:T264990|T264990]]) (duration: 00m 58s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki ([[phab:T265690|T265690]]) (duration: 00m 58s)
* 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A ([[phab:T265372|T265372]], [[phab:T265556|T265556]]) (duration: 00m 58s)
* 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs ([[phab:T266285|T266285]]) (duration: 00m 58s)
* 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code ([[phab:T71729|T71729]]) (duration: 00m 58s)
* 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist ([[phab:T265558|T265558]]) (duration: 00m 59s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis ([[phab:T265556|T265556]]) (duration: 00m 58s)
* 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 17:48 mutante: an-worker109* - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: [[phab:T265809|T265809]], {{Gerrit|I1011f63ae61f5a6}} (duration: 01m 00s)
* 16:41 XioNoX: bounce security log on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
* 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
* 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
* 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
* 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
* 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
* 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
* 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
* 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
* 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
* 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
* 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
* 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
* 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
* 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
* 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
* 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
* 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
* 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
* 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
* 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
* 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
* 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
* 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
* 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
* 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
* 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org [[phab:T265857|T265857]]
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bff6b37a55fe8f260fe00cbb942c53101167fb07}}: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T266390|T266390]]) (duration: 01m 14s)
* 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - [[phab:T265911|T265911]]
* 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
* 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - [[phab:T265911|T265911]]
* 10:18 godog: roll restart pybal to apply latest configuration
* 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
* 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
* 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:58 moritzm: installing freetype security updates for stretch
* 08:57 XioNoX: remove down sessions to AS38758
* 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 XioNoX: remove down sessions to AS8560
* 08:41 XioNoX: remove down sessions to AS31334
* 08:28 XioNoX: remove down sessions to AS6327
* 08:27 XioNoX: remove down sessions to AS8674
* 08:25 XioNoX: remove down sessions to AS24429
* 08:21 XioNoX: remove down sessions to AS16509
* 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
* 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
* 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
* 06:10 marostegui: Warm up tables [[phab:T261914|T261914]]


== 2020-10-25 ==
== 2021-10-09 ==
* 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 dwisehaupt: kernel upgrade and reboot for fran1001
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]


== 2020-10-23 ==
== 2021-10-08 ==
* 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins ([[phab:T266086|T266086]])
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ [[phab:T266155|T266155]]
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - [[phab:T266023|T266023]] (forgot to log this earlier)
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - [[phab:T257905|T257905]]
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes  [[phab:T264388|T264388]]
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - [[phab:T257905|T257905]]
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:32 ryankemper: [[phab:T292814|T292814]] Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id [[phab:T292814|T292814]]` on `ryankemper@cumin1001` tmux `elastic`
* 04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
* 04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
* 04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
* 04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
* 03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' {{!}} mwscript purgeList.php , ref [[phab:T287425|T287425]], [[phab:T292810|T292810]]
* 00:07 tgr_: deploy window over
* 00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498{{!}}Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)


== 2020-10-22 ==
== 2021-10-07 ==
* 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - [[phab:T257940|T257940]]
* 23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 3/3 (duration: 00m 55s)
* 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded [[phab:T265963|T265963]]
* 23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 2/3 (duration: 00m 55s)
* 21:56 mutante: deploy1002 - scap pull  and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver ([[phab:T265963|T265963]])
* 23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 1/3 (duration: 00m 56s)
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 2/2 (duration: 00m 56s)
* 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 1/2 (duration: 00m 57s)
* 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
* 21:35 urbanecm: Password reset for SUL User:LA2-bot ([[phab:T292793|T292793]])
* 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:43 brennen@deploy1002: rebuilt and synchroniz