You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server admin log/Archive 40: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Nhatminh01
(Created page with "== 2020-04-30 == * 23:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|9065650}}: Add project taglines (T249047) (duration:...")
 
imported>Nintendofan885
(Nintendofan885 moved page Server admin log/Archive 40 to Server Admin Log/Archive 40: Match top page)
 
Line 1: Line 1:
== 2020-04-30 ==
#REDIRECT [[Server Admin Log/Archive 40]]
* 23:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|9065650}}: Add project taglines ([[phab:T249047|T249047]]) (duration: 01m 04s)
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|9065650}}: Add project taglines ([[phab:T249047|T249047]]) (duration: 01m 04s)
* 23:17 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|9065650}}: Add project taglines ([[phab:T249047|T249047]]) (duration: 01m 04s)
* 23:14 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|9065650}}: Add project taglines ([[phab:T249047|T249047]]) (duration: 01m 05s)
* 23:08 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|ae1424a}}: Logo wordmarks should not define fill color - opacity will be used ([[phab:T251135|T251135]]) (duration: 01m 05s)
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cf5f7ff}}: Assign oathauth-verify-user to stewards ([[phab:T251447|T251447]]) (duration: 01m 05s)
* 20:13 shdubsh: test mtail rc35 upgrade on logstash1007 - [[phab:T251466|T251466]]
* 20:10 rzl: mcrouter certs re-renewed on puppetmaster1001, puppet enabled on mcrouter hosts
* 20:05 jeh: cloudvirt1024 upgrade iDRAC firmware from 2.4.8 to 2.5.4 [[phab:T241884|T241884]]
* 20:04 rzl: Disabling puppet on all mcrouter hosts for cert renewal. This isn't strictly needed, as the certs from last time are still fine -- just testing the renewal script.
* 19:42 jeh: reboot cloudvirt1024 for NIC firmware updates [[phab:T241884|T241884]]
* 19:25 shdubsh: test mtail rc35 upgrade on fermium - [[phab:T251466|T251466]]
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091', diff saved to https://phabricator.wikimedia.org/P11104 and previous config saved to /var/cache/conftool/dbconfig/20200430-174057-marostegui.json
* 17:37 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 and group2 wikis to 1.35.0-wmf.28"
* 17:24 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: labs only (duration: 00m 58s)
* 15:28 volans@deploy1001: Finished deploy [homer/deploy@56506db]: Release v0.2.1 (duration: 00m 21s)
* 15:27 volans@deploy1001: Started deploy [homer/deploy@56506db]: Release v0.2.1
* 15:11 marostegui: Create lag on es1021
* 14:53 krinkle@deploy1001: Synchronized wmf-config/db-eqiad.php: {{Gerrit|I46d2b811f6287689}} (duration: 00m 57s)
* 14:38 krinkle@deploy1001: Synchronized wmf-config/db-codfw.php: {{Gerrit|I46d2b811f6287689}} (duration: 00m 57s)
* 14:24 vgutierrez: upgrade trafficserver to 8.0.7-1wm2 on cp[5006,5011]
* 14:13 marostegui: Stop slave on es2020 for testing
* 14:11 vgutierrez: rolling restart of ats-tls on text@esams - [[phab:T249335|T249335]]
* 14:01 vgutierrez: upgrade trafficserver to version 8.0.7-1wm2 on cp4025 and cp4031
* 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.30
* 12:44 arturo: re-enable puppet in apt1001
* 12:35 jbond42: rolling restart of php7.2-fpm on mw1* servers
* 12:09 jbond42: rolling restart of thumbor service
* 12:02 arturo: disable puppet in apt1001 to briefly test a reprepro pull filter before merging a proper patch
* 11:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:56 jbond42: updating tiff on stretch
* 11:54 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to clean unused openstack components and packages (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/593223)
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|83e1475}}: Uncoupling graphoid on testwiki ([[phab:T242855|T242855]]) (duration: 01m 06s)
* 11:07 marostegui: Deploy schema change on db1091
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P11099 and previous config saved to /var/cache/conftool/dbconfig/20200430-110721-marostegui.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11098 and previous config saved to /var/cache/conftool/dbconfig/20200430-110539-marostegui.json
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6572e25}}: Enable transwiki import from wikidata, frwikisource and hiwikibooks in hiwikisource ([[phab:T251485|T251485]]) (duration: 01m 12s)
* 10:51 mutante: bromine,vega,miscweb[12]002: rm -rf /srv/org/wikimedia/TransparencyReport-private
* 10:09 jayme: imported helm 2.12.2-4 to main for buster-wikimedia
* 09:53 jayme: imported helm3 3.2.0-1+deb10u1 to main for buster-wikimedia
* 09:47 kormat: reimaging db1077 for testing purposes [[phab:T251392|T251392]]
* 08:36 XioNoX: change blackhole term scope on all routers - [[phab:T226742|T226742]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089', diff saved to https://phabricator.wikimedia.org/P11097 and previous config saved to /var/cache/conftool/dbconfig/20200430-075211-marostegui.json
* 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P11096 and previous config saved to /var/cache/conftool/dbconfig/20200430-065044-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P11095 and previous config saved to /var/cache/conftool/dbconfig/20200430-065008-marostegui.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089', diff saved to https://phabricator.wikimedia.org/P11094 and previous config saved to /var/cache/conftool/dbconfig/20200430-064450-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092', diff saved to https://phabricator.wikimedia.org/P11093 and previous config saved to /var/cache/conftool/dbconfig/20200430-051818-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092', diff saved to https://phabricator.wikimedia.org/P11092 and previous config saved to /var/cache/conftool/dbconfig/20200430-051637-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P11091 and previous config saved to /var/cache/conftool/dbconfig/20200430-051506-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P11090 and previous config saved to /var/cache/conftool/dbconfig/20200430-051329-marostegui.json
* 05:02 marostegui: Restart x1 master finished - [[phab:T250701|T250701]]
* 05:00 marostegui: Restart x1 master (db1120) - [[phab:T250701|T250701]]
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11089 and previous config saved to /var/cache/conftool/dbconfig/20200430-045159-marostegui.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314', diff saved to https://phabricator.wikimedia.org/P11088 and previous config saved to /var/cache/conftool/dbconfig/20200430-043803-marostegui.json
* 00:26 twentyafterfour: phabricator update finished
* 00:15 twentyafterfour: deploying phabricator update: https://phabricator.wikimedia.org/project/view/4620/
* 00:11 eileen: process-control config revision is {{Gerrit|1f31dd21c5}}
 
== 2020-04-29 ==
* 23:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T91649|T91649]] Drop Sentry, Part II: Stop configuring it for production or Beta Cluster (duration: 01m 05s)
* 23:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T91649|T91649]] Drop Sentry, Part I: Stop loading it anywhere (duration: 01m 05s)
* 23:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set Growth Screener survey sample rate to 0.1% and limit to anons only ([[phab:T248421|T248421]]) (duration: 01m 05s)
* 23:26 RoanKattouw: Ran updateArticleCount.php on trwikisource
* 23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgArticleCount to any on trwikisource (duration: 01m 06s)
* 21:59 bstorm_: upgrading RAID firmware on labsdb1011 [[phab:T249188|T249188]]
* 21:34 volker-e@deploy1001: Finished deploy [design/style-guide@c4956c3]: Deploy design/style-guide:  (duration: 00m 08s)
* 21:34 volker-e@deploy1001: Started deploy [design/style-guide@c4956c3]: Deploy design/style-guide:
* 21:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GlobalBlocking/includes/api/ApiQueryGlobalBlocks.php: [[phab:T251430|T251430]] Unconditionally select gb_timestamp (duration: 01m 06s)
* 21:19 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GlobalBlocking/includes/api/ApiQueryGlobalBlocks.php: [[phab:T251430|T251430]] Unconditionally select gb_timestamp (duration: 01m 05s)
* 21:17 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/Quiz/includes/Quiz.php: Don't crash if quiz attempts to include a bad title [[phab:T251409|T251409]] (duration: 01m 06s)
* 20:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (doc-only) Fix Phabricator task reference for jvwiki logo (duration: 01m 05s)
* 19:34 addshore: repool wdqs2008
* 19:02 addshore: depooling and stopping the updater on wdqs2008 for some query tests (wdqs-internal)
* 17:56 joal@deploy1001: Finished deploy [analytics/refinery@6460d05] (thin): Regular analytics weekly train THIN [{{Gerrit|6460d05}}] (duration: 00m 08s)
* 17:56 joal@deploy1001: Started deploy [analytics/refinery@6460d05] (thin): Regular analytics weekly train THIN [{{Gerrit|6460d05}}]
* 17:54 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 17:44 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 17:24 joal@deploy1001: Finished deploy [analytics/refinery@6460d05]: Regular analytics weekly train [{{Gerrit|6460d05}}] (duration: 77m 08s)
* 16:55 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/MachineVision/src/Hooks.php: Fix hook handling for  hook [[phab:T251408|T251408]] (duration: 01m 05s)
* 16:54 jforrester@deploy1001: sync-file aborted: Fix hook handling for  hook [[phab:T251408|T251408]] (duration: 00m 02s)
* 16:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/EditPage.php: EditPage::showHeader - only warn editing an old revision if it exists [[phab:T251404|T251404]] (duration: 01m 06s)
* 16:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:07 joal@deploy1001: Started deploy [analytics/refinery@6460d05]: Regular analytics weekly train [{{Gerrit|6460d05}}]
* 16:05 joal@deploy1001: Finished deploy [analytics/aqs/deploy@c87c8e2]: Analytics regular weekly deploy (duration: 06m 59s)
* 15:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@c87c8e2]: Analytics regular weekly deploy
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db2087 in s6 and s7 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11085 and previous config saved to /var/cache/conftool/dbconfig/20200429-151219-kormat.json
* 14:56 sukhe: upload cescout 0.1.2-1 to apt.wm.o (buster)
* 14:49 mdholloway: re-ran extension/MachineVision/maintenance/withholdImages.php on commonswiki
* 14:39 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update image withholding term list (duration: 01m 06s)
* 13:05 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 04s)
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:26 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2087 for reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11081 and previous config saved to /var/cache/conftool/dbconfig/20200429-122602-kormat.json
* 11:58 mutante: running puppet on cp-ats - switching backends of wikiworkshop.org
* 11:05 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new properties to wmgWBRepoPreferredPageImagesProperties ([[phab:T249811|T249811]]) (duration: 01m 18s)
* 10:00 kormat: reimaging db2087 to buster [[phab:T250666|T250666]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P11080 and previous config saved to /var/cache/conftool/dbconfig/20200429-095629-marostegui.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11079 and previous config saved to /var/cache/conftool/dbconfig/20200429-095545-marostegui.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P11078 and previous config saved to /var/cache/conftool/dbconfig/20200429-095527-marostegui.json
* 09:38 jbond42: puppet enabled fleetwide
* 09:30 jbond42: disable puppet fleet wide for puppetdb upgrade
* 09:30 jbond42: disable puppet for puppetdb upgrade
* 09:10 vgutierrez: starting rolling restart of ats-tls to enable the TLS session ID based cache - [[phab:T170567|T170567]]
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 08:49 mutante: gerrit1002 - gzipping gerrit.log.2020-04* files in /var/log/gerrit ([[phab:T243808|T243808]])
* 08:45 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 08:32 vgutierrez: upgrade to ATS 8.1 on cp4032 - [[phab:T249335|T249335]]
* 08:31 vgutierrez: restart ats-tls on cp[3054,3064]
* 08:08 moritzm: installing openldap security updates
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11075 and previous config saved to /var/cache/conftool/dbconfig/20200429-080206-marostegui.json
* 07:55 marostegui: Upgrade mysql on x1 master (without restarting) in preparation for tomorrow's upgrade - [[phab:T250701|T250701]]
* 07:54 _joe_: restarting php-fpm on mw1288 (workers die in SIGILL status)
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11074 and previous config saved to /var/cache/conftool/dbconfig/20200429-073144-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11073 and previous config saved to /var/cache/conftool/dbconfig/20200429-071431-marostegui.json
* 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:32 marostegui: stop mysql on db1105 for reimage
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and 3312 for reimage', diff saved to https://phabricator.wikimedia.org/P11072 and previous config saved to /var/cache/conftool/dbconfig/20200429-062254-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314', diff saved to https://phabricator.wikimedia.org/P11071 and previous config saved to /var/cache/conftool/dbconfig/20200429-061941-marostegui.json
* 06:17 vgutierrez: ats-tls restart on cp[3050,3058]  - [[phab:T249335|T249335]]
* 06:07 vgutierrez: ats-tls restart on cp3064  - [[phab:T249335|T249335]]
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1114', diff saved to https://phabricator.wikimedia.org/P11070 and previous config saved to /var/cache/conftool/dbconfig/20200429-054733-marostegui.json
* 00:56 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Growth Study QuickSurvey on enwiki (with sample size 0, for testing) ([[phab:T248421|T248421]]) (duration: 01m 10s)
* 00:43 catrope@deploy1001: Finished scap: Update WikimediaMessages with new i18n messages for [[phab:T248421|T248421]] (duration: 55m 23s)
 
== 2020-04-28 ==
* 23:48 catrope@deploy1001: Started scap: Update WikimediaMessages with new i18n messages for [[phab:T248421|T248421]]
* 23:40 ejegg: updated payments-wiki from {{Gerrit|8c896a8247}} to {{Gerrit|afb84cc391}}
* 21:55 ejegg: updated Payments IPN listener (Standalone SmashPig) from {{Gerrit|d80e4c5abd}} to {{Gerrit|8c30ed7fe5}}
* 20:57 cdanis@cumin1001: dbctl commit (dc=all): 's8 weights: -db1111, +db1099,db1101', diff saved to https://phabricator.wikimedia.org/P11069 and previous config saved to /var/cache/conftool/dbconfig/20200428-205739-cdanis.json
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0639d9f}}: Allow bdwikimedia bureaucrats to revoke sysop flag ([[phab:T251078|T251078]]) (duration: 01m 05s)
* 18:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|07c28d1}}: GrowthExperiments: cswiki: Change manual of style to 5 pillars ([[phab:T251290|T251290]]) (duration: 01m 05s)
* 17:51 reedy@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/OAuth: [[phab:T251306|T251306]] (duration: 01m 06s)
* 17:13 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@678fb8e]: Update mobileapps to {{Gerrit|ff88022a}} (duration: 03m 23s)
* 17:10 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@678fb8e]: Update mobileapps to {{Gerrit|ff88022a}}
* 16:37 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/Revision/RevisionStore.php: Follow-up {{Gerrit|If770120}}: Fix bad combination of type cast and ?? operator (duration: 01m 06s)
* 16:36 volker-e@deploy1001: Finished deploy [design/style-guide@335122b]: Deploy design/style-guide:  (duration: 00m 08s)
* 16:36 volker-e@deploy1001: Started deploy [design/style-guide@335122b]: Deploy design/style-guide:
* 15:49 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:42 mepps: updated payments-wiki from {{Gerrit|45bf1734e0}} to {{Gerrit|8c896a8247}},
* 15:34 vgutierrez: rolling restart of ats-tls on cp[3050,3052,3054,3056] - [[phab:T249335|T249335]]
* 15:28 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2b87a75]: Switch off rules moved to k8s [[phab:T248677|T248677]] (duration: 01m 20s)
* 15:27 ppchelko@deploy1001: Started deploy [changeprop/deploy@2b87a75]: Switch off rules moved to k8s [[phab:T248677|T248677]]
* 15:15 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:13 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:13 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:13 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:59 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:58 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:58 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:56 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:55 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:50 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:49 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:49 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:45 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:45 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:39 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:39 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:37 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:37 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:35 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:35 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:33 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:28 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:25 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:25 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:25 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:23 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:23 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:21 moritzm: restarting KDC on krb1001 to pick up openssl update
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:59 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:59 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:54 moritzm: installing idp-test2001.wikimedia.org
* 13:53 vgutierrez: update ATS 8.1 on cp4026 - [[phab:T249335|T249335]]
* 13:48 hknust: holger@mwmaint1002 end (frwiki=success)
* 13:44 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:43 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:43 ottomata: enabling Kafka TLS for eventgate-main
* 13:33 mutante: running puppet on cp-ats - switching backends of design.wikimedia.org and sitemaps.wikimedia.org
* 13:30 hknust: Restarting uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]] for frwiki
* 13:26 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 13:08 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 13:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.30
* 12:39 marostegui: Deploy schema change on dbstore1004:3314
* 12:39 marostegui: Deploy schema change on db1102:3314
* 12:35 marostegui: Temporarily change query killer from 300 seconds to 3600 on labsdb1010 [[phab:T249188|T249188]]
* 11:56 Lucas_WMDE: EU SWAT done
* 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=thwikibooks --fix {{!}} tee [[phab:T251118|T251118]]-fix
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592645{{!}}Create a bunch of namespace aliases for thwikibooks (T251118)]] (duration: 01m 05s)
* 11:52 liw@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.30 (duration: 48m 53s)
* 11:45 marostegui: Deploy schema change on s8 eqiad master with replication [[phab:T250071|T250071]]
* 11:34 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:33 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:20 moritzm: updated ssacli/ssaducli for buster-wikimedia's thirdparty/hwraid component to 4.15-6.0
* 11:04 liw@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.30
* 10:48 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.27 (duration: 12m 37s)
* 10:48 _joe_: running heavy_page test on mw1407,9
* 10:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reimaging to buster [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11064 and previous config saved to /var/cache/conftool/dbconfig/20200428-104650-kormat.json
* 10:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2014.codfw.wmnet
* 10:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2014.codfw.wmnet
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2014.codfw.wmnet
* 10:40 XioNoX: remove unused policy-statements from routers
* 10:39 ema: cp-text: upgrade purged to 0.9 and restart
* 10:38 _joe_: running load.php test on mw1407,9
* 10:34 _joe_: running main_page test on mw1407,9
* 10:28 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.30 (duration: 01m 27s)
* 10:28 addshore: repool wdqs1007 (lag caught up)
* 10:10 _joe_: starting benchmarks for light page on mw140<nowiki>{</nowiki>7,9<nowiki>}</nowiki>
* 10:08 ema: upload purged 0.9 to buster-wikimedia
* 10:05 liw: 1.35.0-wmf.30 was branched at {{Gerrit|ffc8e887573d7b288067b263c5b6047b2b2db081}} for [[phab:T249962|T249962]]
* 09:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:52 liw: starting branch cut for train
* 09:35 addshore: depool wdqs1007 to catch up on lag a bit
* 09:32 mutante: running puppet on cp-ats for backend config change
* 09:23 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 09:20 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2124 [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11063 and previous config saved to /var/cache/conftool/dbconfig/20200428-092052-kormat.json
* 09:12 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99)
* 09:12 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 08:55 XioNoX: re-set lost licenses on asw2-a/b-eqiad
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11060 and previous config saved to /var/cache/conftool/dbconfig/20200428-084041-marostegui.json
* 08:36 dcausse: deleting wikidatawiki_content_1587076410 from cloudelastic
* 08:30 _joe_: restarting php-fpm on mw1407 and mw1409 again, then running traffic on them for 1 hour.
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoo db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11059 and previous config saved to /var/cache/conftool/dbconfig/20200428-082420-marostegui.json
* 08:21 dcausse: restarting blazegraph on wdqs1007 ([[phab:T242453|T242453]])
* 08:20 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:17 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 08:13 kormat: reimaging db2124 to buster [[phab:T250666|T250666]]
* 08:13 mutante: rsyncing transparency-report-private files from bromine to miscweb1002/2002. git-cloning was removed about a year ago but site still exists. need to figure out if it should be deleted ([[phab:T188362|T188362]] [[phab:T247650|T247650]])
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoo db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11058 and previous config saved to /var/cache/conftool/dbconfig/20200428-080920-marostegui.json
* 08:06 moritzm: installing qemu security updates
* 07:52 _joe_: running benchmarks on mw1407 (LCStoreStaticArray) and mw1409 (LCStoreCDB) for [[phab:T99740|T99740]]: restart php-fpm, pool for 5 minutes to warmup caches, then depool both servers.
* 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:26 marostegui: Reimage db1105
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and 3312 for reimage', diff saved to https://phabricator.wikimedia.org/P11057 and previous config saved to /var/cache/conftool/dbconfig/20200428-072416-marostegui.json
* 06:35 marostegui: Deploy schema change on s3 master with replication for the wikis at [[phab:T250071|T250071]]#6051598 - [[phab:T250071|T250071]]
* 06:06 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T250055|T250055]]
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11056 and previous config saved to /var/cache/conftool/dbconfig/20200428-055719-marostegui.json
* 05:52 marostegui: Reclone labsdb1011 from labsdb1012 - [[phab:T249188|T249188]]
* 05:42 marostegui: Restart labsdb1011 with innodb_purge_threads set to 10 - [[phab:T249188|T249188]]
* 05:35 marostegui: Deploy schema change on db1112
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P11054 and previous config saved to /var/cache/conftool/dbconfig/20200428-053453-marostegui.json
* 04:59 vgutierrez: depool and powercycle cp5012
* 04:37 kart_: Updated cxserver to 2020-04-27-061703-production ([[phab:T249852|T249852]])
* 04:34 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 04:22 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 04:18 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
 
== 2020-04-27 ==
* 23:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update logos for tiwiki and tiwiktionary ([[phab:T150618|T150618]], [[phab:T249451|T249451]]) (duration: 00m 57s)
* 23:20 catrope@deploy1001: Synchronized static/images/project-logos/: Update logos for tiwiki and tiwiktionary ([[phab:T150618|T150618]], [[phab:T249451|T249451]]) (duration: 00m 58s)
* 23:18 catrope@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: Enable VisualEditor by default on srwiki ([[phab:T250878|T250878]]) (duration: 00m 57s)
* 23:16 catrope@deploy1001: Synchronized wmf-config/config/srwiki.yaml: Enable VisualEditor by default on srwiki ([[phab:T250878|T250878]]) (duration: 00m 58s)
* 20:58 bearND: mobileapps deploy on canary failed due to timeouts, rolled back.
* 20:56 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@99c350c]: Update mobileapps to {{Gerrit|09cb7c2e}} (duration: 00m 52s)
* 20:55 hknust: holger@mwmaint1002 END (enwiki=success, frwiki=fail) uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]]
* 20:55 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@99c350c]: Update mobileapps to {{Gerrit|09cb7c2e}}
* 20:43 bearND: mobileapps deployed failed due to timeouts, rolled back.
* 20:42 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@99c350c]: Update mobileapps to {{Gerrit|09cb7c2e}} (duration: 06m 24s)
* 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@99c350c]: Update mobileapps to {{Gerrit|09cb7c2e}}
* 20:28 hknust: holger@mwmaint1002 Restarting uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]] for 2 wikis
* 19:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:23 ppchelko@deploy1001: Finished deploy [changeprop/deploy@ecca66b]: Switch off rules moved to k8s [[phab:T248677|T248677]] (duration: 01m 22s)
* 19:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:22 ppchelko@deploy1001: Started deploy [changeprop/deploy@ecca66b]: Switch off rules moved to k8s [[phab:T248677|T248677]]
* 19:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:50 James_F: Manually ran `scap pull` on mw1279.eqiad.wmnet as it flaked during deploy.
* 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Ready wmgVisualEditorAllowExternalLinkPaste to set wgVisualEditorAllowExternalLinkPaste (duration: 01m 29s)
* 18:48 James_F: Sync failure to mw1279.eqiad.wmnet (timeout)
* 18:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS: Set wmgVisualEditorAllowExternalLinkPaste false everywhere except officewiki (duration: 01m 17s)
* 18:25 Urbanecm: Run namespaceDupes.php for thwikisource ([[phab:T251134|T251134]])
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|56a447e}}: Create several namespace aliases for thwikisource ([[phab:T251134|T251134]]) (duration: 00m 58s)
* 18:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Kartographer/modules/: SWAT: {{Gerrit|6cd2847}}: Do not use remove() on maplinks ([[phab:T250620|T250620]]; [[phab:T251053|T251053]]) (duration: 00m 58s)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|8b71f38}}: Remove use of `wgAllowImageMoving` ([[phab:T245293|T245293]]) (duration: 00m 57s)
* 18:04 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: wgEventStreams: in beta, merge settings from production - [[phab:T242122|T242122]] (duration: 00m 56s)
* 18:02 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: configure SearchSatisfaction - [[phab:T249261|T249261]] (duration: 00m 58s)
* 17:34 ppchelko@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:31 ppchelko@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:46 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:07 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:02 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:45 gehel: restart wdqs-updater on all servers
* 15:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:26 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11048 and previous config saved to /var/cache/conftool/dbconfig/20200427-152242-marostegui.json
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P11047 and previous config saved to /var/cache/conftool/dbconfig/20200427-145851-marostegui.json
* 14:50 jynus: setting default etherpadlite db on m1 to utf8mb4_bin
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11046 and previous config saved to /var/cache/conftool/dbconfig/20200427-145010-marostegui.json
* 14:46 vgutierrez: pool cp4026 running ATS 8.1.0 - [[phab:T249335|T249335]]
* 14:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:30 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:30 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:27 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:27 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P11045 and previous config saved to /var/cache/conftool/dbconfig/20200427-142006-marostegui.json
* 13:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:55 vgutierrez: depool cp4026 and upgrade to ATS 8.1.0 - [[phab:T249335|T249335]]
* 13:53 vgutierrez: restart ats-tls on cp3056 - [[phab:T249335|T249335]]
* 13:52 mutante: decom'ing install1002 and install2002 - see install1003/2003 and apt1001/2001
* 13:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:47 marostegui: Deploy schema change on s3 codfw, lag will show up - [[phab:T250055|T250055]]
* 13:46 marostegui: Drop img_deleted column from wikitech - [[phab:T250055|T250055]]
* 13:45 marostegui: Drop img_deleted column from s7 eqiad - [[phab:T250055|T250055]]
* 13:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 13:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:41 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 13:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:38 _joe_: repooling both mw1407 and mw1409 for tesing [[phab:T99740|T99740]]
* 13:30 _joe_: depooled mw1409 as well as mw1407 for further benchmarking, [[phab:T99740|T99740]]
* 13:28 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:28 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 13:10 elukey: roll restart elastic on cloudelastic-chi again to pick up new JVM settings - [[phab:T231517|T231517]]
* 13:09 marostegui: Deploy schema change on s7 codfw, lag will show up - [[phab:T250055|T250055]]
* 12:53 marostegui: Drop [[phab:T248086|T248086]]_wb_terms from db1104 - [[phab:T248086|T248086]]
* 12:50 marostegui: Removed img_deleted from s1 (enwiki) [[phab:T250055|T250055]]
* 12:49 akosiaris: rolling back etherpad to 1.8.0
* 12:45 akosiaris: upgrade etherpad to 1.8.3
* 12:41 marostegui: Remove empty table [[phab:T248086|T248086]]_wb_terms from wikidatawiki on s3 eqiad - [[phab:T248086|T248086]]
* 12:36 marostegui: Remove empty table [[phab:T248086|T248086]]_wb_terms from wikidatawiki on s3 codfw master - [[phab:T248086|T248086]]
* 12:32 marostegui: Remove empty table [[phab:T248086|T248086]]_wb_terms from wikidatawiki on s8 codfw master - [[phab:T248086|T248086]]
* 12:15 marostegui: Remove empty table [[phab:T248086|T248086]]_wb_terms from commonswiki and testcommonswiki on s4 master - [[phab:T248086|T248086]]
* 12:06 ema: cp: upgrade purged to 0.8 [[phab:T249583|T249583]]
* 11:53 Lucas_WMDE: EU SWAT done
* 11:49 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary. ([[phab:T249613|T249613]])
* 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592515{{!}}Enable cross-project search on frwiktionary (T250724)]] (duration: 00m 57s)
* 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592330{{!}}Add transwiki import sources in zhwiki (T250972)]] (duration: 00m 57s)
* 11:25 addshore: repool wdqs1007 [[phab:T242453|T242453]]
* 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592306{{!}}Add two domains in wgCopyUploadsDomains (T250903, T250904)]] (duration: 00m 57s)
* 11:21 _joe_: restarted php-fpm on mw1407 to pick up enlarged opcache values, [[phab:T99740|T99740]]
* 11:14 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:592634{{!}} Bumping portals to master (563985)]] (duration: 00m 57s)
* 11:13 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:592634{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable constraints on production commons (duration: 00m 57s)
* 11:08 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable constraints on production commons (duration: 00m 58s)
* 10:52 hoo: Running the pruneItemsPerSite on mwmaint1002 maintenance script for Wikidata ([[phab:T249613|T249613]])
* 10:52 hoo@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Wikibase: pruneItemsPerSite: Fix join_condition call signature ([[phab:T249613|T249613]]) (duration: 01m 02s)
* 10:49 hoo@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Wikibase: pruneItemsPerSite: Fix join_condition call signature ([[phab:T249613|T249613]]) (duration: 01m 01s)
* 10:32 mutante: contint2001 - systemd status was degraded. icinga alerted. failed unit was jenkins. starting it failed with "address already in use". manually started without using systemctl?  killed jenkins and started again with systemctl.  [[phab:T224591|T224591]]
* 10:29 mutante: contint2001 - jenkins failed and can't start because address is already in use
* 10:23 addshore: depool and restart wdqs1007 (deadlocks) [[phab:T242453|T242453]]
* 09:54 hoo@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Wikibase: Add pruneItemsPerSite maintenance script ([[phab:T249613|T249613]]) (duration: 01m 06s)
* 09:34 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:34 jynus@cumin2001: START - Cookbook sre.hosts.decommission
* 09:34 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:33 jynus@cumin2001: START - Cookbook sre.hosts.decommission
* 09:33 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:32 jynus@cumin2001: START - Cookbook sre.hosts.decommission
* 09:32 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:31 jynus@cumin2001: START - Cookbook sre.hosts.decommission
* 09:25 marostegui: Stop MySQL on labsdb1012 to reclone labsdb1011 - [[phab:T249188|T249188]]
* 09:11 marostegui: Deploy schema change on s1 codfw, lag will show up - [[phab:T250055|T250055]]
* 08:52 moritzm: restarting cas on idp1001 to pick up Java 11 security update (will void active SSO sessions)
* 08:26 marostegui: Deploy schema change on s5 codfw, lag will show up - [[phab:T250055|T250055]]
* 08:24 kormat: Truncating and optimizing parsercache for pc1010 and pc2010 [[phab:T247787|T247787]]
* 08:18 mutante: running puppet on all cp-ats
* 08:15 godog: add 80G to prometheus global LV
* 07:25 elukey: roll restart elastic-chi on cloudelastic100[1-4] to pick up the last JVM GC settings - [[phab:T231517|T231517]]
* 07:15 marostegui: Kill updateSpecialPages.php wikidatawiki --override --only=Fewestrevisions as it is causing lag - [[phab:T238199|T238199]]
* 07:14 elukey: powercycle an-worker1089 - unreachable via ssh, mgmt serial available, soft cpu lock events registered in dmesg
* 06:59 elukey: force ifdown/ifup eno1 on analytics1052 - interface negotiated speed flapping
* 06:42 moritzm: installing Java security updates on IDP hosts, will void current SSO sessions
* 06:30 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:00 marostegui: Stop MySQL on labsdb1011 for reimage - [[phab:T249188|T249188]]
* 05:58 moritzm: installing git security updates on jessie
* 05:56 marostegui: Compress tables on db1104 - [[phab:T232446|T232446]]
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for defragmentation - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11039 and previous config saved to /var/cache/conftool/dbconfig/20200427-055320-marostegui.json
* 05:47 vgutierrez: rolling restart ats-tls in cp[1085,1089] and text@esams - [[phab:T249335|T249335]]
* 05:33 marostegui: Depool labsdb1011 [[phab:T249188|T249188]]
 
== 2020-04-26 ==
* 18:08 elukey: powercycle puppetmaster1001 - mgmt serial console not usable, no ssh, racadm getsel doesn't show anything
 
== 2020-04-25 ==
* 10:23 addshore: going to restart and probably depool for a short time wdqs1005 as it is in a deadlock [[phab:T242453|T242453]]
* 05:52 _joe_: depooling mw1407 again, should not be serving traffic
* 05:27 shdubsh: restart elasticsearch on logstash2022
 
== 2020-04-24 ==
* 21:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 19:41 Amir1: applying [[phab:T114117|T114117]] on labswiki (wikitech)
* 18:58 shdubsh: restart elasticsearch on logstash2021
* 18:50 shdubsh: restart elasticsearch on logstash2020
* 15:12 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 15:08 addshore: depool and restart wdqs1006 to catch up with lag after deadlock [[phab:T242453|T242453]]
* 11:13 Amir1: apply [[phab:T250071|T250071]] on s10 (labswiki)
 
== 2020-04-23 ==
* 22:06 Urbanecm: Perform timeouting rename at enwiki Wikipedia talk:Introduction --> Wikipedia talk:Introduction (historical) using moveBatch.php ([[:meta:Special:Diff/20009402{{!}}request]])
* 18:38 ejegg: updated payments-wiki from {{Gerrit|1640f5e21e}} to {{Gerrit|45bf1734e0}}
 
== 2020-04-22 ==
* 08:55 Urbanecm: Move User:Wikipedia:Introduction (historical) --> Wikipedia:Introduction (historical) at enwiki using moveBatch.php, on-wiki interface was time-outing
* 05:50 elukey@deploy1001: Finished deploy [analytics/refinery@30facc4]: Test of new scap settings (duration: 04m 42s)
* 05:45 elukey@deploy1001: Started deploy [analytics/refinery@30facc4]: Test of new scap settings
* 05:25 elukey@deploy1001: deploy aborted: log (duration: 00m 02s)
* 05:24 elukey@deploy1001: Started deploy [analytics/refinery@30facc4]: log
* 01:55 milimetric@deploy1001: Finished deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (take 2, analytics1030 keeps failing) (duration: 00m 42s)
* 01:54 milimetric@deploy1001: Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (take 2, analytics1030 keeps failing)
* 01:54 milimetric@deploy1001: Finished deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (duration: 02m 54s)
* 01:51 milimetric@deploy1001: Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump
* 01:51 milimetric@deploy1001: deploy aborted: Analytics: another follow-up on the train, jar version bump (duration: 04m 08s)
* 01:46 milimetric@deploy1001: Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump
* 01:43 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T209749|T209749]] (duration: 01m 01s)
 
== 2020-04-21 ==
* 23:41 maryum: deploy complete for wdqs v0.3.23
* 23:36 mstyles@deploy1001: Finished deploy [wdqs/wdqs@4e0d55f]: v0.3.23 (duration: 11m 35s)
* 23:25 mstyles@deploy1001: Started deploy [wdqs/wdqs@4e0d55f]: v0.3.23
* 23:19 maryum: begin deploy of WDQS v 0.3.23 on deploy1001
* 22:41 eileen: process-control config revision is {{Gerrit|6294adfbaa}}
* 22:24 milimetric@deploy1001: Finished deploy [analytics/refinery@64c5ec4]: Analytics: tiny follow-up on weekly train [analytics/refinery@64c5ec4] (duration: 37m 05s)
* 21:56 andrewbogott: rebooting cloudvirt1004, total raid controller failure
* 21:50 urandom: bootstrapping restbase2014-c — [[phab:T250050|T250050]]
* 21:46 milimetric@deploy1001: Started deploy [analytics/refinery@64c5ec4]: Analytics: tiny follow-up on weekly train [analytics/refinery@64c5ec4]
* 21:38 milimetric@deploy1001: Finished deploy [analytics/refinery@35781db]: Regular Analytics weekly train deploy [analytics/refinery@35781db] try 2 (analytics1030 failed with OSError the first time) (duration: 00m 13s)
* 21:37 milimetric@deploy1001: Started deploy [analytics/refinery@35781db]: Regular Analytics weekly train deploy [analytics/refinery@35781db] try 2 (analytics1030 failed with OSError the first time)
* 21:21 milimetric@deploy1001: Finished deploy [analytics/refinery@35781db]: Regular Analytics weekly train deploy [analytics/refinery@35781db] (duration: 16m 19s)
* 21:05 milimetric@deploy1001: Started deploy [analytics/refinery@35781db]: Regular Analytics weekly train deploy [analytics/refinery@35781db]
* 21:05 milimetric@deploy1001: Finished deploy [analytics/refinery@35781db] (thin): Regular Analytics weekly train deploy THIN [analytics/refinery@35781db] (duration: 00m 08s)
* 21:05 milimetric@deploy1001: Started deploy [analytics/refinery@35781db] (thin): Regular Analytics weekly train deploy THIN [analytics/refinery@35781db]
* 19:09 rzl: mcrouter certs renewed on puppetmaster1001 (again); puppet re-enabled on mcrouter hosts and will update certs naturally over the next 30m [[phab:T248093|T248093]]
* 19:02 urandom: bootstrapping restbase2014-b — [[phab:T250050|T250050]]
* 18:28 hoo: Updated the Wikidata property suggester with data from the 2020-04-06 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 18:19 rzl: disabling puppet on all mcrouter hosts for cert renewal [[phab:T248093|T248093]]
* 17:19 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:49 urandom: bootstrapping restbase2014-a — [[phab:T250050|T250050]]
* 15:40 cmjohnson1: replacing mgmt switch on a6-eqiad [[phab:T250652|T250652]]
* 15:38 hashar: CI is back, patches would need to be rechecked by commenting "recheck" in Gerrit.
* 15:32 hashar: Restarting Gerrit [[phab:T250820|T250820]] [[phab:T246973|T246973]]
* 15:26 hashar: CI / Zuul does not get any events for some reason :/
* 14:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:59 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 hashar: contint2001: manually dropping /var/lib/docker (we now use /srv/docker )
* 14:48 jbond42: restart haproxy on dns-auth
* 14:48 hashar: restarting docker on contint2001
* 14:47 volker-e@deploy1001: Finished deploy [design/style-guide@d101234]: Deploy design/style-guide:  (duration: 00m 09s)
* 14:47 volker-e@deploy1001: Started deploy [design/style-guide@d101234]: Deploy design/style-guide:
* 14:45 jbond42: puppet enabled again
* 14:40 moritzm: restarting apache on miscweb
* 14:37 moritzm: restarting apache on netbox1001
* 14:36 jbond42: disable puppet fleet wide to restart puppemaster
* 14:28 moritzm: installing OpenSSL security updates
* 14:17 vgutierrez: rolling upgrade of ats to version 8.0.7-1wm1
* 14:16 moritzm: installing OpenSSL updates on caches
* 14:08 hashar: contint1001: rm /var/log/apache2/doc_*  # service has been moved to doc1001.eqiad.wmnet
* 13:43 vgutierrez: upload trafficserver 8.0.7-1wm1 to apt.wm.o (buster)
* 13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:15 mutante: recreating cert for contint/integration to add integration.mediawiki.org in addition to integration.wikimedia.org
* 11:06 mutante: https://integration.wikimedia.org now also using TLS between ATS and contint1001 using envoy ([[phab:T210411|T210411]])
* 10:49 _joe_: mwdebug1001:~# iptables -A INPUT -s 10.64.32.208 -m statistic --mode random --probability 0.1 -j DROP ([[phab:T240684|T240684]])
* 08:52 ema: purged: rolling restart with 4 frontend workers
* 07:54 ema: cp3050: restart purged with 4 frontend workers
* 07:47 kormat: dropping old data and optimizing tables on pc1010 and pc2010 [[phab:T247787|T247787]]
* 07:26 ema: cp4032: restart ats-tls and ats-be
* 07:06 ema: cp4026: restart ats-tls and ats-be
* 06:30 marostegui: Rename flagged* tables on mediawikiwiki on db1075 - [[phab:T248298|T248298]]
* 06:24 XioNoX: restore eqsin/ulsfo OSPF metric - [[phab:T250653|T250653]]
* 05:46 marostegui: Deploy schema change on s6 codfw master
* 05:34 marostegui: Add db1095:3312, db1095:3320 to tendril - [[phab:T250602|T250602]]
* 05:32 moritzm: installing git security updates
* 05:19 marostegui: Deploy schema change on s2 codfw - [[phab:T250055|T250055]]
* 05:09 vgutierrez: rolling restart of ats-tls to enable SSL_OP_PRIORITIZE_CHACHA
 
== 2020-04-20 ==
* 23:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update project wordmarks and icons ([[phab:T249047|T249047]]) (duration: 01m 01s)
* 23:27 catrope@deploy1001: Synchronized static/images/mobile/: Update project wordmarks and icons ([[phab:T249047|T249047]]) (duration: 01m 02s)
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add media.api.aucklandmuseum.com to $wgCopyUploadsDomains ([[phab:T250646|T250646]]) (duration: 01m 08s)
* 21:11 mepps: update civicrm from {{Gerrit|1224b080c1}} to {{Gerrit|e8a0b5395d}}
* food: updated fundraising python tools from {{Gerrit|a93eec292d}} to {{Gerrit|c96813eda4}}
* 20:14 halfak@deploy1001: Finished deploy [ores/deploy@514f94a]: [[phab:T250536|T250536]] (duration: 14m 06s)
* 20:00 halfak@deploy1001: Started deploy [ores/deploy@514f94a]: [[phab:T250536|T250536]]
* 19:53 addshore: pool wdqs1006 again (caught up)
* 19:53 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 19:45 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Revert CirrusSearch-MoreLike pool conter numbers now rebuild is done (duration: 01m 01s)
* 19:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Move more_like from codfw back to eqiad, rebuild complete (duration: 01m 03s)
* 19:40 rzl: mcrouter certs renewed on puppetmaster1001; puppet re-enabled on mcrouter hosts and will update certs naturally over the next 30m [[phab:T248093|T248093]]
* 18:39 rzl: disabling puppet on all mcrouter hosts for cert renewal [[phab:T248093|T248093]]
* 18:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248418|T248418]] [testwiki] Force videojs-only mode for TimedMediaHandler (duration: 01m 01s)
* 18:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/MassMessage/includes/SpecialEditMassMessageList.php: [[phab:T250710|T250710]] Follow-up {{Gerrit|95c772864}}: Fix RevisionRecord calls that differ from Revision (duration: 01m 02s)
* 18:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adjust Parsoid/VE disable comment for wikitechwiki (duration: 01m 02s)
* 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:22 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:21 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:21 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:20 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Adjust dummy name of fake Parsoid extension to just 'Parsoid' (duration: 01m 01s)
* 18:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:14 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T236104|T236104]] Wait to update the globals cache file for opcache regeneration (duration: 01m 02s)
* 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 02s)
* 18:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 02s)
* 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: DiscussionTools: EditAttemptStepSamplingRate increase for some wikis [[phab:T250086|T250086]] (duration: 01m 10s)
* 15:33 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Victorgrigas /home/urbanecm/upload ([[phab:T250687|T250687]])
* 15:10 marostegui: Upgrade db2079
* 14:57 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 14:54 addshore: restart blazegraph on wdqs1006
* 14:53 addshore: depool wdqs1006 as it stopped updating
* 14:28 marostegui: Upgrade db2096 (x1 codfw master)
* 14:24 marostegui: Upgrade db2101
* 14:18 marostegui: Upgrade dbstore1005
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1127 after upgrade', diff saved to https://phabricator.wikimedia.org/P11025 and previous config saved to /var/cache/conftool/dbconfig/20200420-141711-marostegui.json
* 14:13 marostegui: Upgrade db2131
* 14:10 marostegui: Upgrade db1127
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for upgrade', diff saved to https://phabricator.wikimedia.org/P11023 and previous config saved to /var/cache/conftool/dbconfig/20200420-141017-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P11022 and previous config saved to /var/cache/conftool/dbconfig/20200420-140642-marostegui.json
* 13:50 marostegui: Deploy schema change on codfw master - [[phab:T250055|T250055]]
* 13:30 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Undeploying graphoid on beta (duration: 01m 07s)
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change, restore db1097:3314 original weights', diff saved to https://phabricator.wikimedia.org/P11021 and previous config saved to /var/cache/conftool/dbconfig/20200420-131823-marostegui.json
* 12:40 XioNoX: remove all disabled termsfrom cr2-eqiad
* 12:31 XioNoX: remove all disabled BGP neighbors on cr2-esams
* 12:11 mateusbs17: Running `REINDEX DATABASE gis` in maps2004.codfw.wmnet (which is depooled at the moment)
* 11:41 mutante: puppetmaster - revoking cert for webserver-misc-apps.discovery.wmnet and recreating it with additional static microsite names ([[phab:T247650|T247650]])
* 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:591017{{!}}Temporarily enable event oversampling for conflicts (T249616)]] (duration: 01m 00s)
* 11:25 awight@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/TwoColConflict: SWAT: [[gerrit:591016{{!}}Configurable EditStepAttempt oversampling for conflicts (T249616)]] (duration: 01m 03s)
* 11:05 mutante: rsyncing static-bugzilla files from bromine to miscweb1002 ([[phab:T247650|T247650]])
* 11:02 mutante: bromine/vega: stop rsyncd which was removed from puppet
* 10:49 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:591023{{!}} Bumping portals to master (563985)]] (duration: 00m 57s)
* 10:48 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:591023{{!}} Bumping portals to master (563985)]] (duration: 01m 03s)
* 10:37 elukey: apt-get purge rsync on mwlog* after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589600/
* 10:08 XioNoX: uRPF, sample + discard in eqiad - [[phab:T244147|T244147]]
* 10:06 XioNoX: uRPF, sample + discard in eqord - [[phab:T244147|T244147]]
* 09:51 XioNoX: uRPF, sample + discard in dfw - [[phab:T244147|T244147]]
* 09:38 XioNoX: uRPF, sample + discard in ulsfo - [[phab:T244147|T244147]]
* 09:19 Urbanecm: Security deploy for [[phab:T250594|T250594]]
* 08:46 vgutierrez: restart ats-tls in cp3064 - [[phab:T249335|T249335]]
* 08:35 jayme: imported helmfile 0.66.0-1+deb10u1 to main for buster-wikimedia
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Temporary pool db1097:3314 into API', diff saved to https://phabricator.wikimedia.org/P11019 and previous config saved to /var/cache/conftool/dbconfig/20200420-082019-marostegui.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081', diff saved to https://phabricator.wikimedia.org/P11018 and previous config saved to /var/cache/conftool/dbconfig/20200420-081911-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11017 and previous config saved to /var/cache/conftool/dbconfig/20200420-081623-marostegui.json
* 08:14 marostegui: Remove img_deleted column from db1089 (enwiki), db1081 (commonswiki, db1111 (wikidatawiki) - [[phab:T250055|T250055]]
* 08:09 jynus: restarting s3 instance on db1095 to reduce its buffer pool [[phab:T250602|T250602]]
* 07:22 _joe_: restarting php-fpm on the eqiad appservers to pick up the new max_execution_time
* 07:20 marostegui: Re add tl_namespace index to db1104 and db1092 - [[phab:T250060|T250060]]
* 06:45 moritzm: installing python2.7 security updates on jessie
* 06:41 elukey: execute find -mtime +30 -delete in /var/log/airflow/scheduler on an-airflow1001 to free space
* 06:25 moritzm: installing libxdmcp security updates on jessie
* 06:16 moritzm: installing bash updates on jessie
* 05:54 vgutierrez: rolling restart of ats-tls in cp[3052,3054,3056,3058,3060,4028,4029,4030,4031,4032] - [[phab:T249335|T249335]]
* 05:53 marostegui: Deploy schema change on s8 eqiad hosts [[phab:T250060|T250060]]
* 05:50 marostegui: Deploy schema change on s8 codfw - lag will show up [[phab:T250060|T250060]]
* 04:55 ariel@deploy1001: Finished deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests (duration: 00m 04s)
* 04:55 ariel@deploy1001: Started deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests
 
== 2020-04-19 ==
* 16:19 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labs: Move RB traffic to new stretch host (duration: 01m 11s)
* 16:05 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 05:51 marostegui: Power back on db1140 [[phab:T250602|T250602]]
 
== 2020-04-18 ==
* 22:50 addshore: pool wdqs1006 blazegraph caught up [[phab:T242453|T242453]]
* 20:30 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 20:27 thcipriani: restart gerrit-replica
* 16:40 dcausse: forcing replica count to 1 on some cloudelastic@chi indices
* 15:13 Amir1: applying schema change of [[phab:T139090|T139090]] on labswiki (wikitech)
* 14:03 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 12:19 addshore: restarting blazegraph on wdqs1006 blazegraph stuck [[phab:T242453|T242453]]
* 12:15 addshore: depool wdqs1006 blazegraph stuck [[phab:T242453|T242453]]
* 06:07 XioNoX: change OSPF metrics to prefer ulsfo tunnel transport
 
== 2020-04-17 ==
* 19:33 Krinkle: Depool mw1407.eqiad.wmnet for opcache testing.  Do not repool without first reverting https://gerrit.wikimedia.org/r/589674.
* 19:32 Krinkle: Depool mw1407.eqiad.wmnet for opcache and LCStoreStaticArray testing. – [[phab:T99740|T99740]]
* 17:41 cmjohnson1: replacing network cable pc1009 [[phab:T250257|T250257]]
* 17:34 cmjohnson1: moving msw1 to msw-c racks mounted switch cable ports from port 49 to port 50
* 17:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:15 Urbanecm: Revert recent email change of User:CPHL@SUL's email
* 16:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 15:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 15:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 15:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:20 rzl: remove cronjobs from mwmaint1002 previously updated to systemd timers and erroneously left in crontab -- diffs: https://phabricator.wikimedia.org/P11012 [[phab:T211250|T211250]]
* 14:29 mutante: ganeti2001 - kileld and restarted gnt-rapi process with the correct new key and cert
* 14:19 cdanis: add peer AS29802 to cr2-eqdfw and cr2-esams
* 14:01 mutante: netbox1001 - netbox_ganeti_eqiad_synx / systemd state fixed after gnt-rapi is runnign again on ganeti1003
* 14:00 mutante: ganeti1003 - fixing gnt-rapi daemon not running
* 13:54 mateusbs17: Running VACUUM FULL for gis DB in maps2004.codfw.wmnet (which is depooled at the moment)
* 13:00 mutante: netbox1001 - sudo systemctl start netbox_ganeti_eqiad_sync (was failed)
* 12:54 mutante: contint2001 /usr/local/sbin/build-envoy-config -c /etc/envoy ; restart envoyproxy; was not listening on admin port
* 12:45 mutante: cntint2001 - restart nagios-nrpe-server
* 12:28 moritzm: copied kubernetes-client from stretch-wikimedia to buster-wikimedia [[phab:T224591|T224591]]
* 11:35 mutante: contint2001 - apt-get update, run puppet to install helm-diff
* 11:33 jayme: imported helm-diff 2.11.0+3-2+deb10u1 to main for buster-wikimedia
* 11:23 dzahn@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 11:23 dzahn@cumin2001: START - Cookbook sre.hosts.decommission
* 11:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:17 _joe_: contint1001:~$ sudo systemctl restart envoyproxy.service
* 10:16 _joe_: contint1001:~$ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy
* 10:07 kormat: change pc2010 to replicate from pc1010 [[phab:T247787|T247787]]
* 09:54 kormat: enabling replication from pc1007 to pc1010 [[phab:T247787|T247787]]
* 09:20 jayme: imported helm 2.12.2 to main for buster-wikimedia
* 09:07 vgutierrez: disable KA between ats-tls and varnish-fe on cp1077 - [[phab:T250258|T250258]]
* 09:00 kormat: dropping wikidatawiki.wb_items_per_site_old table in eqiad (non-labs hosts)  [[phab:T250345|T250345]]
* 08:15 kormat: dropping wikidatawiki.wb_items_per_site_old table in codfw  [[phab:T250345|T250345]]
* 07:54 ema: cache_text: puppet run to stop vhtcpd and start purged [[phab:T249325|T249325]]
* 07:45 gehel: restart wdqs-updater on all nodes after deployment
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11005 and previous config saved to /var/cache/conftool/dbconfig/20200417-063138-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1111 from API', diff saved to https://phabricator.wikimedia.org/P11004 and previous config saved to /var/cache/conftool/dbconfig/20200417-063038-marostegui.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11003 and previous config saved to /var/cache/conftool/dbconfig/20200417-062642-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11002 and previous config saved to /var/cache/conftool/dbconfig/20200417-061907-marostegui.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11001 and previous config saved to /var/cache/conftool/dbconfig/20200417-060419-marostegui.json
 
== 2020-04-16 ==
* 22:34 maryum: reindexing wikis that failed from previous reindex on mwmain1002
* 22:10 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.26 (duration: 05m 26s)
* 21:59 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/FlaggedRevs/: [[phab:T250439|T250439]] Don't try to create a Revision with null (duration: 01m 02s)
* 21:54 bsitzmann@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 21:51 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 21:48 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 20:42 mstyles@deploy1001: Finished deploy [wdqs/wdqs@1fb52b3]: WDQS version 0.3.22 (duration: 11m 43s)
* 20:30 mstyles@deploy1001: Started deploy [wdqs/wdqs@1fb52b3]: WDQS version 0.3.22
* 20:01 maryum: "beginning deploy of WDQS 0.3.22"
* 19:06 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.28
* 18:57 krinkle@deploy1001: Synchronized errorpages/404.php: {{Gerrit|I9fd5c99130c64}} (duration: 01m 07s)
* 17:52 XioNoX: rename/format asw-ulsfo interfaces to match future homer driven format
* 16:51 herron: kafka-logging eqiad set retention.bytes=500000000000 on topic udp_localhost-warning [[phab:T250133|T250133]]
* 16:45 herron: kafka-logging eqiad set retention.bytes=500000000000 on topic udp_localhost-info [[phab:T250133|T250133]]
* 16:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:54 elukey: restart chi on cloudelastic1001 with -XX:NewRatio=3 - [[phab:T231517|T231517]]
* 15:26 akosiaris: truncate /var/log/ganeti/monitoring-daemon-error.log on ganeti1003, start again all ganeti daemons
* 15:20 akosiaris: stop ganeti daemons on ganeti1003
* 15:02 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Petri Gyula' '23eki' ([[phab:T250387|T250387]])
* 14:51 hknust: holger@mwmaint1002 END (Fail)  uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]]
* 14:30 hknust: holger@mwmaint1002 Starting  uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]]
* 14:21 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 14:17 hnowlan@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Enabling rules on k8s, disabling on scb (duration: 01m 12s)
* 14:16 hnowlan@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Enabling rules on k8s, disabling on scb
* 14:14 dcausse: elastic (search cluster) reindexing commonswiki_content in codfw and ediad ([[phab:T246882|T246882]])
* 14:13 ema: cache: upgrade varnish to 5.1.3-1wm14 and rolling restart [[phab:T249810|T249810]]
* 13:40 XioNoX: rename/format asw2-esams interfaces to match future homer driven format
* 13:36 kormat: Optimizing all tables on pc1010 [[phab:T247787|T247787]]
* 13:32 hashar: Restarting CI Jenkins for plugin upgrade [[phab:T250377|T250377]]
* 13:04 hnowlan@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again (duration: 00m 30s)
* 13:04 hnowlan@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again
* 13:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 12:54 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 12:54 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 12:48 vgutierrez: pool cp1087
* 12:44 jynus: test sal again
* 11:29 elukey: restart atskafka on cp3050 after maintenance
* 11:22 XioNoX: rename/format asw1-eqsin interfaces to match future homer driven format
* 11:17 elukey: stop atskafka on cp3050 to re-create the topic atskafka_test_webrequest_text on Kafka Jumbo - [[phab:T250347|T250347]]
* 11:16 Urbanecm: EU SWAT done
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|a105f38}}: Remove broken groupOverrides from amwikimedia ([[phab:T249585|T249585]]) (duration: 01m 05s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|70ee5f6}}: Remove grants for tboverride and tboverride-account ([[phab:T241114|T241114]]) (duration: 01m 06s)
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata ([[phab:T250348|T250348]]; take II) (duration: 01m 04s)
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata ([[phab:T250348|T250348]]) (duration: 01m 06s)
* 11:03 urbanecm@deploy1001: sync-file aborted: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata (duration: 00m 00s)
* 10:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:45 hnowlan@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Testing rules moved to k8s (duration: 01m 16s)
* 10:45 vgutierrez: upgrading ATS to version 8.0.7-rc0-1wm3 -  [[phab:T249335|T249335]]
* 10:44 hnowlan@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Testing rules moved to k8s
* 10:44 vgutierrez: rolling restart of ats-tls to enable TLSv1.3 globally and disable the old TLS session cache - [[phab:T170567|T170567]]
* 10:35 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:35 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:31 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:22 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 09:33 elukey: restart atskafka on cp3050 to pick up snappy compression - [[phab:T250347|T250347]]
* 09:32 ema: cp2027: upgrade varnish to 5.1.3-1wm14 [[phab:T249810|T249810]]
* 09:17 ema: text@esams: stop vhtcpd, start purged [[phab:T249325|T249325]]
* 09:16 jynus: starting es backups on backup2002 [[phab:T79922|T79922]]
* 08:33 kormat: Disconnect pc1008 replication from pc1010 [[phab:T247787|T247787]]
* 08:22 ema: cp3050: upgrade purged to 0.7 [[phab:T249583|T249583]]
* 08:22 ema: upload purged 0.7 to buster-wikimedia [[phab:T249583|T249583]]
* 08:21 Urbanecm: Set email for Geraki@grwikimedia ([[phab:T245911|T245911]])
* 08:18 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master [[phab:T247787|T247787]] (duration: 01m 08s)
* 08:06 mutante: mw1396 - restarted php7.2-fpm - was: 503 Service Unavailable - header 'X-Powered-By: PHP/7.' not found on 'http://en.wikipedia.org:80/wiki/Main_Page'
* 08:04 mutante: mw1396 - restarted apache
* 07:50 vgutierrez: rolling update ats to version 8.0.7-rc0-1wm3 in cp[4026,4032,5006,5012] - [[phab:T249335|T249335]]
* 07:49 vgutierrez: upload trafficserver 8.0.7-rc0-1wm3 to apt.wm.o (buster) - [[phab:T249335|T249335]]
* 07:15 volker-e@deploy1001: Finished deploy [design/style-guide@2a7cc4a]: Deploy design/style-guide:  (duration: 00m 08s)
* 07:15 volker-e@deploy1001: Started deploy [design/style-guide@2a7cc4a]: Deploy design/style-guide:
* 06:33 moritzm: installing apache-log4j1.2 security updates on jessie
* 06:29 moritzm: installing icu security updates on jessie
* 06:15 moritzm: installing git security updates on jessie
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reorganize s8 weights a little bit after the addition of the new host db1114', diff saved to https://phabricator.wikimedia.org/P10995 and previous config saved to /var/cache/conftool/dbconfig/20200416-054353-marostegui.json
* 05:33 elukey: restart hadoop-yarn-nodemanager on an-worker108[4,5] - failed after GC OOM events (heavy spark jobs)
 
== 2020-04-15 ==
* 22:11 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/MachineVision: Fix: Initialize categories array for initial images ([[phab:T250321|T250321]]) (duration: 01m 07s)
* 21:48 maryum: removing duplicate incdices from production ES clusters that were created when reindexing failed
* 20:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1907571]: Update mobileapps to {{Gerrit|ff34d0b5}} (duration: 04m 57s)
* 20:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1907571]: Update mobileapps to {{Gerrit|ff34d0b5}}
* 19:53 addshore: pool wdqs1006 caught up
* 19:44 addshore: depool wdqs1006 to catch up on lag
* 19:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.28 (duration: 01m 05s)
* 19:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.28
* 18:44 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Idc81a885b2f3}}, [[phab:T196309|T196309]] (duration: 01m 07s)
* 18:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 18:10 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:589029{{!}}Fix GrowthExperiments helpdesk URL for frwiktionary (T235964)]] (duration: 01m 06s)
* 16:08 volker-e@deploy1001: Finished deploy [design/style-guide@a4d5794]: Deploy design/style-guide:  (duration: 00m 11s)
* 16:08 volker-e@deploy1001: Started deploy [design/style-guide@a4d5794]: Deploy design/style-guide:
* 15:46 ejegg: updated fundraising CiviCRM from {{Gerrit|18d7567cd7}} to {{Gerrit|1224b080c1}}
* 15:36 ema: cp2029,cp3050: upgrade purged to 0.6, restart varnish-fe [[phab:T249583|T249583]]
* 15:30 ema: upload purged 0.6 to buster-wikimedia [[phab:T249583|T249583]]
* 15:19 papaul: upgrading firmware on restbase2014
* 14:36 vgutierrez: rolling upgrade to ATS 8.0.7-rc0-1wm2 on cp[3064,3065,2042,2041,1090,1089] - [[phab:T249335|T249335]]
* 14:32 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: Drop 'parsoidphp' service, we use 'parsoid' now (duration: 01m 06s)
* 14:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Use 'parsoid' service in lieu of 'parsoidphp' (duration: 01m 07s)
* 14:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s)
* 14:23 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: Add 'parsoid' service to replace 'parsoidphp' (duration: 01m 06s)
* 14:17 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Use MediaWikiServices::getAuthManager on wikitech (duration: 01m 06s)
* 14:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242912|T242912]] Remove wgEnablePartialBlocks config, no longer read (duration: 01m 07s)
* 14:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 (duration: 01m 06s)
* 14:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s)
* 14:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T250181|T250181]] [[phab:T250183|T250183]] Wikibase: Use false instead of database names for 'local' entity sources on test wikis (duration: 01m 06s)
* 14:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 14:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread (duration: 01m 06s)
* 13:59 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true (duration: 01m 06s)
* 13:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgContentHandlerUseDB, now unread (duration: 01m 06s)
* 13:32 ema: upload varnish_5.1.3-1wm14 to buster-wikimedia [[phab:T249810|T249810]]
* 13:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Flow/Hooks.php: [[phab:T248727|T248727]] Adjust to RevisionUndeleted hook now having  (duration: 01m 04s)
* 13:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/LiquidThreads/classes/DeletionController.php: [[phab:T248727|T248727]] Adjust to RevisionUndeleted hook now having  (duration: 01m 06s)
* 13:23 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/includes/page/PageArchive.php: [[phab:T248727|T248727]] Fix RevisionUndeleted hook to add  (duration: 01m 08s)
* 13:23 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight to 100% of target, and reduce db1104 slightly [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10990 and previous config saved to /var/cache/conftool/dbconfig/20200415-132310-kormat.json
* 13:10 hashar: contint2001: starting zuul-merger process # [[phab:T224591|T224591]]
* 12:49 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight to 50% of target [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10989 and previous config saved to /var/cache/conftool/dbconfig/20200415-124931-kormat.json
* 12:41 vgutierrez: rolling upgrade to ATS 8.0.7-rc0-1wm2 in ulsfo and eqsin - [[phab:T249335|T249335]]
* 12:03 mutante: puppetmaster1001: revoking ganeti01.svc.eqiad.wmnet and ganeti01.svc.codfw.wmnet certificates. adding eqiad and codfw to cergen .yaml file, recreating ganeti certs
* 11:27 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 01m 03s)
* 11:26 awight@deploy1001: sync-file aborted: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 00m 02s)
* 11:23 awight: EU SWAT complete
* 11:22 awight@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/TwoColConflict: SWAT: [[gerrit:588966{{!}}Flatten exit logging (T248601)]] (duration: 01m 09s)
* 11:09 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (duration: 01m 24s)
* 10:57 marostegui: Deploy schema change on s8 codfw master - [[phab:T250057|T250057]]
* 10:25 ema: cp3050: varnish-frontend-restart to clear mbox lag and see how long it takes to show up [[phab:T249583|T249583]]
* 10:02 ema: upload purged 0.5 to buster-wikimedia [[phab:T249583|T249583]]
* 09:50 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 09:48 vgutierrez: disable KA between ats-tls and varnish-fe for POST requests on eqiad - [[phab:T250258|T250258]]
* 09:45 godog: force-run curator from logstash1008 - [[phab:T250133|T250133]]
* 09:43 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight some more [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10988 and previous config saved to /var/cache/conftool/dbconfig/20200415-094305-kormat.json
* 09:08 elukey: restart druid brokers on druid100[4-6] - stuck after datasource deletion
* 09:07 vgutierrez: repool cp1081
* 08:54 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10986 and previous config saved to /var/cache/conftool/dbconfig/20200415-085432-kormat.json
* 08:54 vgutierrez: depool cp1081 for debugging purposes
* 08:46 XioNoX: reset edac counters on scb1001
* 08:43 dcausse: errata: elastic (search cluster) reindexing commonswiki_content on cloudelastic ([[phab:T246882|T246882]])
* 08:42 dcausse: elastic (search cluster) reindex commmonswiki_content on cloudelastic ([[phab:T246882|T246882]])
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 on s8 with low weight [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10985 and previous config saved to /var/cache/conftool/dbconfig/20200415-081421-marostegui.json
* 07:59 marostegui: Deploy schema change on s7 codfw master - [[phab:T250057|T250057]]
* 07:35 elukey: restart cloudelastic-chi on cloudelastic1002 to apply new jvm settings - [[phab:T231517|T231517]]
* 06:55 mutante: install1003 moving /srv/autoinstall to /root, running puppet, leaving a README file to point out it moved to apt1001
* 06:47 marostegui: Deploy schema change on s6 codfw with replication - [[phab:T250057|T250057]]
* 06:43 marostegui: Deploy schema change on labtestwiki - [[phab:T250057|T250057]]
* 06:43 XioNoX: re-set asw2-c-eqiad's licenses
* 06:42 marostegui: Deploy schema change on labswiki - [[phab:T250057|T250057]]
* 06:32 XioNoX: set uRPF log action back to log infra wide - [[phab:T244147|T244147]]
* 06:04 vgutierrez: update to ats 8.0.7-rc0-1wm2 on cp[5006,5012] - [[phab:T249335|T249335]]
* 05:49 moritzm: installing git security updates
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:22 kart_: Update cxserver to 2020-04-13-094138-production ([[phab:T239459|T239459]], [[phab:T249469|T249469]])
* 05:21 marostegui: Remove db1114 from tendril and zarcillo [[phab:T250224|T250224]]
* 05:17 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:13 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:11 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:07 marostegui: Remove db1114 from tendril - [[phab:T250224|T250224]]
 
== 2020-04-14 ==
* 23:24 AndyRussG: re-enabled thank-you, onimailing and new recurring charge jobs
* 22:59 AndyRussG: disabled thank-you and omnimailing jobs
* 22:59 AndyRussG: fundraising civicrm revision changed from {{Gerrit|59e712ce8e}} to {{Gerrit|18d7567cd7}}
* 21:36 addshore: pool wdqs1006, it is caught up
* 21:03 addshore: depool wdqs1006 to give it a chance to catch up on lag
* 20:34 cdanis@cumin1001: dbctl commit (dc=all): 'tweak db1111 weight yet again', diff saved to https://phabricator.wikimedia.org/P10979 and previous config saved to /var/cache/conftool/dbconfig/20200414-203426-cdanis.json
* 20:18 James_F: Adding Create-Signed-Tag right to wikimedia-ui-base group for wikimedia-ui-base repo
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Change s8 weights', diff saved to https://phabricator.wikimedia.org/P10978 and previous config saved to /var/cache/conftool/dbconfig/20200414-201412-marostegui.json
* 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'reduce db1126 weight due to cpu issues', diff saved to https://phabricator.wikimedia.org/P10977 and previous config saved to /var/cache/conftool/dbconfig/20200414-195855-marostegui.json
* 19:57 cdanis@cumin1001: dbctl commit (dc=all): '+db1111, -db1126', diff saved to https://phabricator.wikimedia.org/P10976 and previous config saved to /var/cache/conftool/dbconfig/20200414-195734-cdanis.json
* 19:51 cdanis@cumin1001: dbctl commit (dc=all): 'more weight to db1104', diff saved to https://phabricator.wikimedia.org/P10975 and previous config saved to /var/cache/conftool/dbconfig/20200414-195100-cdanis.json
* 19:47 cdanis@cumin1001: dbctl commit (dc=all): '+weight on db1104@s8', diff saved to https://phabricator.wikimedia.org/P10974 and previous config saved to /var/cache/conftool/dbconfig/20200414-194710-cdanis.json
* 19:26 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.28
* 19:22 ebernhardson@deploy1001: Finished scap: wmf-config/PoolCounterSettings.php cirrus: increase pool counter size for traffic shift to codfw (duration: 21m 55s)
* 19:00 ebernhardson@deploy1001: Started scap: wmf-config/PoolCounterSettings.php cirrus: increase pool counter size for traffic shift to codfw
* 18:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:35 jforrester@deploy1001: Finished scap: Testwikis to php-1.35.0-wmf.28 and rebuild i18n cache for [[phab:T247775|T247775]] (duration: 42m 37s)
* 17:26 ppchelko@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again (duration: 00m 56s)
* 17:25 ppchelko@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again
* 17:23 ppchelko@deploy1001: deploy aborted: Rollback removing k8s rules, again (duration: 00m 05s)
* 17:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Rollback removing k8s rules, again
* 17:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] attempt 2 (duration: 00m 25s)
* 17:12 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] attempt 2
* 17:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:07 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:52 jforrester@deploy1001: Started scap: Testwikis to php-1.35.0-wmf.28 and rebuild i18n cache for [[phab:T247775|T247775]]
* 16:49 jforrester@deploy1001: sync aborted: testwikis wikis to 1.35.0-wmf.28 (duration: 00m 05s)
* 16:49 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.28
* 16:38 akosiaris: stop all ganeti components (VMs are fine) on all ganeti2* hosts for key/cert rollover
* 16:38 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.25 (duration: 17m 20s)
* 16:20 James_F: Scap cleaning 1.35.0-wmf.25 [[phab:T247775|T247775]]
* 16:07 ariel@deploy1001: Finished deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression, retry (duration: 00m 04s)
* 16:06 ariel@deploy1001: Started deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression, retry
* 16:06 ppchelko@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules (duration: 01m 20s)
* 16:06 ejegg: disabled new recurring payments charge job
* 16:05 ppchelko@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules
* 16:04 ariel@deploy1001: Finished deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression (duration: 00m 04s)
* 16:04 ariel@deploy1001: Started deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression
* 15:52 ema: cp3050: suspend purged testing, varnish-frontend-restart to clear mailbox lag [[phab:T249583|T249583]]
* 15:50 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:49 James_F: 1.35.0-wmf.28 was branched at {{Gerrit|ded5b87df12cea88d94dde0fa22cac13227f8e92}} for [[phab:T247775|T247775]]
* 15:47 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:19 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:17 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:15 vgutierrez: update to ats 8.0.7-rc0-1wm2 on cp[4026,4032] - [[phab:T249335|T249335]]
* 15:13 vgutierrez: upload trafficserver 8.0.7-rc0-1wm2 to apt.wm.o (buster) - [[phab:T249335|T249335]]
* 15:12 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:11 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:44 ppchelko@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] (duration: 01m 58s)
* 14:42 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]]
* 14:34 godog: power down ms-be1023 - [[phab:T249174|T249174]]
* 14:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 elukey: enable TLS between weblog1001,mwlog2001.codfw.wmnet,mwlog1001 and Kafka Jumbo/Logging - [[phab:T250147|T250147]]
* 14:15 hashar: Rebasing mediawiki-config on deploy1001 for a deployment-prep config change ( https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/588706/ )
* 14:12 ema: cp3050: resume purged testing [[phab:T249583|T249583]]
* 13:55 ema: upload purged 0.4 to buster-wikimedia [[phab:T249583|T249583]]
* 13:21 hashar: Starting zuul-merger on contint2001
* 12:50 vgutierrez: Enable inbound TLSv1.3 in text@eqsin - [[phab:T170567|T170567]]
* 12:03 jbond42: upgrade haproxy on dns servers
* 11:08 Urbanecm: EU SWAT done
* 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/cswiki*.png ([[phab:T249173|T249173]])
* 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|7da408e}}: Revert "Enable cswiki anniversary logo" ([[phab:T249173|T249173]]) (duration: 01m 00s)
* 11:01 jynus: resizing backup1001:/srv/databases to 40 TB
* 10:55 XioNoX: set uRPF log action to syslog infra wide - [[phab:T244147|T244147]]
* 10:15 XioNoX: update prefix-list LVS-service-ips to add missing prefixes
* 09:49 XioNoX: re-order aggregate routes to standardize order
* 09:48 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr2-eqdfw - [[phab:T246721|T246721]]
* 09:47 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr2-eqord - [[phab:T246721|T246721]]
* 09:37 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr1/2-codfw - [[phab:T246721|T246721]]
* 09:17 XioNoX: add missing `routing-options rib inet6.0 aggregate defaults discard` where missing (cr3-knams, cr3-esams, cr2-eqord, cr2-eqdfw, cr1/2-eqiad/codfw)
* 09:13 godog: add mwilliams to 'wmf' ldap group - [[phab:T249844|T249844]]
* 09:08 marostegui: Add kormat to ops and wmf ldap groups - [[phab:T250134|T250134]]
* 08:49 elukey: restart elastic-chi on cloudelastic1001 with -XX:NewSize=10G - [[phab:T231517|T231517]]
* 07:33 elukey: apply CMS GC settings to chi on cloudelastic1001 - [[phab:T231517|T231517]]
* 05:30 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in esams and eqiad
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 01m 00s)
 
== 2020-04-13 ==
* 23:24 mdholloway: re-ran extensions/MachineVision/maintenance/withholdImages.php on commonswiki
* 23:14 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision withholding list additions ([[phab:T249939|T249939]]) (duration: 00m 59s)
* 22:41 cdanis: repool codfw
* 22:35 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1052 for excessive old gc over last few hours
* 22:35 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1052
* 22:08 cdanis: depool codfw
* 21:43 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki ([[phab:T249273|T249273]])
* 21:34 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on testcommonswiki
* 21:32 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add script to apply blacklist to current labels ([[phab:T249273|T249273]]) (duration: 00m 58s)
* 20:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision blocklist update ([[phab:T249895|T249895]]) (duration: 00m 59s)
* 19:56 mdholloway: finished running extensions/MachineVision/maintenance/withholdImages.php on commonswiki ([[phab:T249939|T249939]])
* 19:51 mdholloway: running extensions/MachineVision/maintenance/withholdImages.php on commonswiki
* 19:41 mdholloway: ran extensions/MachineVision/maintenance/withholdImages.php on testcommonswiki
* 19:37 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add support for WITHHOLD_ALL review state ([[phab:T249939|T249939]]) (duration: 01m 23s)
* 19:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Add MachineVisionWithholdImageList config ([[phab:T249939|T249939]]) (duration: 01m 03s)
* 19:06 niedzielski: Morning SWAT done
* 19:02 niedzielski@deploy1001: Synchronized php-1.35.0-wmf.27/skins/MinervaNeue: SWAT: [[gerrit:588405{{!}}Update the icon glyph (T249864)]] (duration: 01m 00s)
* 18:49 niedzielski@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TwoColConflict: SWAT: [[gerrit:588370{{!}}Fix double HTML escaping of "copytext" lines in the diff (T249986)]] (duration: 01m 01s)
* 17:01 XioNoX: sample before any other border-in terms in eqiad
* 16:57 XioNoX: sample before any other border-in terms in esams
* 16:50 XioNoX: sample before any other border-in terms in dfw
* 16:46 XioNoX: sample before any other border-in terms in ulsfo
* 16:36 XioNoX: sample before any other border-in terms in eqsin
* 16:36 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:33 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:31 XioNoX: Sample all inbound v6 traffic on cr2-eqsin
* 16:31 cmjohnson1: replacing msw-c6-eqiad
* 16:30 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 15:56 marostegui: Deploy schema change on s4 codfw master - [[phab:T250067|T250067]]
* 12:12 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in eqsin and codfw
* 11:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 11:57 marostegui: Deploy schema change on eqiad s8 hosts - [[phab:T250062|T250062]]
* 11:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 marostegui: Deploy schema change on codfw master (lag will appear on codfw) - [[phab:T250062|T250062]]
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|efe2feb}}: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki ([[phab:T248860|T248860]]; take II) (duration: 00m 58s)
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|efe2feb}}: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki ([[phab:T248860|T248860]]) (duration: 00m 58s)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:588383{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:588383{{!}} Bumping portals to master (563985)]] (duration: 01m 00s)
* 10:24 mutante: depooled wdqs1004 by request because of high lag
* 10:19 marostegui: Kill updateSpecialPages.php --only=Fewestrevisions for s8 in mwmaint1002, the vslow host is lagging and creating errors
* 10:12 mutante: mwmaint1002 - sudo systemctl status mediawiki_job_translationnotifications-mediawikiwiki.service
* 09:52 Urbanecm: Rename user account Gerakiw@grwikimedia to Geraki@grwikimedia ([[phab:T245911|T245911]])
* 09:47 Urbanecm: mwscript createAndPromote.php --wiki=grwikimedia --force Gerakiw <redacted> ([[phab:T245911|T245911]])
* 08:15 marostegui: Remove grants for haproxy@10.64.37.15 from labsdb hosts [[phab:T231280|T231280]]
* 07:50 vgutierrez: enable memory tracking in ats-tls on cp1085 - [[phab:T249335|T249335]]
* 07:43 marostegui: Compress db1092 [[phab:T232446|T232446]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Temporary pool db1111 in s8 API', diff saved to https://phabricator.wikimedia.org/P10964 and previous config saved to /var/cache/conftool/dbconfig/20200413-074158-marostegui.json
* 07:40 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in ulsfo
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10963 and previous config saved to /var/cache/conftool/dbconfig/20200413-073939-marostegui.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 [[phab:T249973|T249973]]', diff saved to https://phabricator.wikimedia.org/P10962 and previous config saved to /var/cache/conftool/dbconfig/20200413-071740-marostegui.json
* 06:51 marostegui: Deploy schema changes on db1110 - [[phab:T249973|T249973]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 [[phab:T249973|T249973]]', diff saved to https://phabricator.wikimedia.org/P10961 and previous config saved to /var/cache/conftool/dbconfig/20200413-065022-marostegui.json
* 06:36 elukey: temporary stopped puppet on restbase2014 to avoid attempts to start cassandra on each run - [[phab:T250050|T250050]]
* 06:23 vgutierrez: upgrade to ats 8.0.7-rc0-1wm1 on cp[4026,4032,5006,5012]
* 06:20 vgutierrez: upload trafficserver 8.0.7-rc0-1wm1 to apt.wm.o (buster)
* 05:25 vgutierrez: restart varnish-fe on cp3050
 
== 2020-04-12 ==
* 11:11 vgutierrez: restart ats-tls on cp5008.eqsin.wmnet - [[phab:T249335|T249335]]
* 10:18 elukey: restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure)
* 06:59 dcausse: restarting blazegraph on wdqs1004 ([[phab:T242453|T242453]])
* 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet
* 06:32 elukey: powerdown restbase1025 - [[phab:T250027|T250027]]
* 06:21 elukey: powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2)
* 05:53 bblack: pushing https://gerrit.wikimedia.org/r/588134 to cache_text
* 05:50 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
 
== 2020-04-11 ==
* 19:52 cdanis@cumin1001: dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json
* 17:35 cdanis@cumin1001: dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json
* 15:39 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 09:20 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 07:01 vgutierrez: restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
 
== 2020-04-10 ==
* 21:12 cdanis@cumin1001: dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json
* 19:37 cdanis: cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220
* 15:03 vgutierrez: restart ats-tls on cp1083 and cp1085 - [[phab:T249335|T249335]]
* 13:14 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s)
* 13:14 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 13:12 mutante: restarted and re-armed keyholder on deploy1001 to pick up changes for zuul scap deploy
* 12:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:11 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. ([[phab:T249907|T249907]])
* 12:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. This may take a few minutes.
* 12:10 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:09 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 11:44 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 11:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10953 and previous config saved to /var/cache/conftool/dbconfig/20200410-094359-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10952 and previous config saved to /var/cache/conftool/dbconfig/20200410-093129-marostegui.json
* 08:52 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 16s)
* 08:51 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 08:46 hashar@deploy1001: Finished deploy [zuul/deploy@5a0a03a]: (no justification provided) (duration: 02m 20s)
* 08:44 hashar@deploy1001: Started deploy [zuul/deploy@5a0a03a]: (no justification provided)
* 08:39 mutante: deploy1001 - keyholder disarm, keyholder arm
* 08:32 mutante: fix comment in deployment ssh key for zuul to include the path to the key on deploy1001
* 08:24 vgutierrez: update puppet compiler facts
* 08:20 hashar@deploy1001: Finished deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) (duration: 00m 11s)
* 08:19 hashar@deploy1001: Started deploy [integration/zuul/deploy@6c3ddad]: (no justification provided)
* 08:03 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 05s)
* 08:03 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 07:52 mutante: closing port 80 on phab hosts for caching servers
* 07:37 ema: cp3050: back to vhtcpd for the holidays [[phab:T249583|T249583]]
* 07:00 mutante: sodium - sudo -u mirror ftpsync
* 06:58 mutante: armed keyholder on deploy1001
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:00 marostegui: Stop MySQL on pc1008 for upgrade
 
== 2020-04-09 ==
* 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:27 catrope@deploy1001: Synchronized wmf-config/mobile.php: Drop fallback support for wgMobileFrontendLogo ([[phab:T248500|T248500]]) (duration: 00m 58s)
* 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop unused config for main page CSS ([[phab:T243996|T243996]]) (duration: 00m 58s)
* 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add extendedconfirmed group and protection level on jawiki ([[phab:T249820|T249820]]) (duration: 00m 59s)
* 22:01 sukhe: running initial metadb sync on cescout1001
* 19:43 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:41 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:39 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 19:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:01 longma: deploying 1.35.0-wmf.27 to all wikis
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:24 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:32 XioNoX: disable down interfaces from fasw-c-codfw (mintaka)
* 13:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:43 mlitn@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision/: [MachineVision] Fix statement creation from suggestion (duration: 01m 09s)
* 12:31 ema: cp3051: upgrade varnish to 5.1.3-1wm13 once again, restart varnish-fe [[phab:T249809|T249809]]
* 11:57 XioNoX: offload more traffic from NTT eqiad - [[phab:T249808|T249808]]
* 11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]], take II (duration: 01m 06s)
* 11:19 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]] (duration: 01m 07s)
* 10:50 vgutierrez: rolling upgrade to trafficserver 8.0.6-1mw7
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:50 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:43 ema: repool cp3051 [[phab:T249809|T249809]]
* 10:30 ema: cp3051: re-enable transient storage limit, downgrade varnish to 5.1.3-1wm12 (no 0035-vbf_stp_condfetch_crash.patch) and restart varnish-fe [[phab:T249809|T249809]]
* 09:46 ema: cp3051: disable transient storage limit and restart varnish-fe [[phab:T249809|T249809]]
* 09:31 XioNoX: offload traffic from NTT eqiad - [[phab:T249808|T249808]]
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) ([[phab:T224591|T224591]])
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
* 07:24 moritzm: synched jenkins 222.1 to apt.wikimedia.org (buster-wikimedia, thirdparty/ci) [[phab:T224591|T224591]]
* 07:12 marostegui: Repool labsdb1011
* 07:10 XioNoX: switch urpf from log to syslog in ulsfo
* 07:04 XioNoX: re-activate BGP to Zayo in eqiad
* 06:59 vgutierrez: upgrade ats to version 8.0.6-1wm7 in cp[4026,4032,5006,5012]
* 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:43 XioNoX: confirmed on one host that the change didn't break logstash. Re-enable Puppet on logstash hosts - [[phab:T244147|T244147]]
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 XioNoX: disabling puppet on logstash host for CR deploy - [[phab:T244147|T244147]]
* 06:30 XioNoX: push urpf log only to eqiad - [[phab:T244147|T244147]]
* 06:25 XioNoX: push urpf log only to eqsin - [[phab:T244147|T244147]]
* 06:21 XioNoX: push urpf log only to AMS - [[phab:T244147|T244147]]
* 05:40 vgutierrez: upgrade ats to version 8.0.6-1wm6 in cp[4025,4031,5005,5011] - [[phab:T249335|T249335]]
* 05:37 marostegui: Stop MySQL on pc2008 for upgrade to Buster and 10.4
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 01m 08s)
* 05:08 marostegui: Deploy schema change on db1123
* 05:07 vgutierrez: upload trafficserver 8.0.6-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]]
 
== 2020-04-08 ==
* 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 07s)
* 21:19 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 09s)
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 19:51 gehel: restart wdqs-updater after deployment
* 19:49 mstyles@deploy1001: Finished deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21 (duration: 14m 37s)
* 19:44 dpifke@deploy1001: Finished deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091 (duration: 00m 05s)
* 19:44 dpifke@deploy1001: Started deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091
* 19:35 mstyles@deploy1001: Started deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21
* 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]] (duration: 01m 06s)
* 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:02 longma: deploying 1.35.0-wmf.27 to group1
* 18:37 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/skins/Vector: [[phab:T248761|T248761]]: Revert moving indicators in DOM (duration: 01m 07s)
* 18:17 reedy@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 06s)
* 18:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 10s)
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 ema: cache_upload: rolling varnish-fe restarts to bump transient storage limit [[phab:T185968|T185968]]
* 15:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 ema: cp3051: param.set shortlived=0 to try ease pressure on transient memory
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P10947 and previous config saved to /var/cache/conftool/dbconfig/20200408-142341-marostegui.json
* 14:14 jeh@deploy1001: Finished deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups (duration: 03m 30s)
* 14:10 jeh@deploy1001: Started deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups
* 13:40 mutante: stopped and masked zuul-merger service on contint2001 via puppet ([[phab:T224591|T224591]])
* 13:30 ema: cp3050: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 13:22 vgutierrez: enable inbound TLSv1.3 in text@ulsfo - [[phab:T170567|T170567]]
* 13:05 ema: purged 0.1 uploaded to buster-wikimedia [[phab:T249583|T249583]]
* 12:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 12:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585219{{!}}Enable GrowthExperiments suggested edits on uk, hu, hy, eu wikipedias (T247308)]] (duration: 01m 08s)
* {{safesubst:SAL entry|1=12:17 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584135{{!}}Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias (T238295) (duration: 01m 08s)}}
* 12:09 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 06s)
* 11:56 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 03s)
* 11:48 mutante: logstash1009 - restarted logstash
* 11:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585766{{!}}Enable WikibaseQualityConstraints on test commons (T248117)]] (duration: 01m 05s)
* 11:43 marostegui: Deploy schema change on db1112, this will generate lag on labs s3
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P10942 and previous config saved to /var/cache/conftool/dbconfig/20200408-114315-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P10941 and previous config saved to /var/cache/conftool/dbconfig/20200408-113901-marostegui.json
* 11:29 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 06s)
* 11:28 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 17s)
* 11:05 XioNoX: push urpf log only to codfw - [[phab:T244147|T244147]]
* 10:39 jbond42: restarting idp.wikimedia.org
* 10:14 marostegui: Deploy schema change on db1078
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P10940 and previous config saved to /var/cache/conftool/dbconfig/20200408-101431-marostegui.json
* 09:30 jynus: stopping and removing db1095:s8 instance
* 09:20 godog: upgrade grafana on cloudmetrics hosts - [[phab:T244208|T244208]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P10939 and previous config saved to /var/cache/conftool/dbconfig/20200408-091728-marostegui.json
* 09:11 gehel: setting weight=10 for all pooled wdqs servers in codfw - [[phab:T246343|T246343]]
* 09:10 marostegui: Reload proxies on dbproxy1018 and dbproxy1019 to depool labsdb1011 - [[phab:T249188|T249188]] [[phab:T248592|T248592]]
* 09:07 gehel: pooling wdqs200[78] - new servers ready to go! - [[phab:T246343|T246343]]
* 08:46 marostegui: Rename wb_terms and recreate views on labsdb1009-labsdb1011 - [[phab:T248592|T248592]] [[phab:T248086|T248086]]
* 08:39 godog: upgrade grafana on grafana1002 - [[phab:T244208|T244208]]
* 08:17 _joe_: switching parsoid to envoy (take 2) in eqiad
* 07:23 marostegui: Deploy schema change on db1075
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P10937 and previous config saved to /var/cache/conftool/dbconfig/20200408-072331-marostegui.json
* 06:31 marostegui: Deploy schema change on db1095:3313
* 06:11 marostegui: Stop haproxy on dbproxy1011 - [[phab:T231520|T231520]]
* 05:44 vgutierrez: rolling upgrade ATS to 8.0.6-1wm6 in cp[5006,5012,3065,3064,2042,2041,1090,1089]
* 05:34 marostegui: Deploy schema change on dbstore1004:3313
* 05:33 _joe_: repooling wtp1025, with envoy and logging any error above 404 [[phab:T249535|T249535]]
* 04:36 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]
 
== 2020-04-07 ==
* 20:39 andrewbogott: correction: briefly downtiming ldap-eqiad-replica0 and ldap-eqiad-replica1.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 20:37 andrewbogott: briefly downtiming serpens and seaborgium.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 20:34 hoo: (Take 3) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 20:17 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 20:09 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.27 (duration: 60m 34s)
* 20:08 hoo: (Take 2) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:45 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:13 XioNoX: push pfw firewall rules - [[phab:T249650|T249650]]
* 19:08 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.27
* 18:48 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.24 (duration: 12m 44s)
* 17:56 herron: increasing codfw.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 17:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) retry (duration: 01m 02s)
* 17:54 addshore: last sync stuck on sync-masters
* 17:54 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) (duration: 01m 16s)
* 17:49 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@83c93d1]: Try to make it notice new partitions [[phab:T240702|T240702]]
* 17:40 herron: increasing eqiad.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 16:24 longma: 1.35.0-wmf.27 was branched at {{Gerrit|e76ac29cd9c57bed4097ec8a4ea8311fb55fd967}} for [[phab:T247774|T247774]]
* 16:16 hashar: restarting CI jenkins
* 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:21 moritzm: installing idp-test2001
* 15:20 XioNoX: enable uRPF loose mode (log only) on cr4-ulsfo - [[phab:T244147|T244147]]
* 15:17 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (12/14.5h) (duration: 01m 00s)
* 15:10 ema: cp3052: stop purged, start vhtcpd [[phab:T249583|T249583]] [[phab:T241232|T241232]]
* 15:00 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:56 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (10/14.5h) (duration: 00m 55s)
* 14:52 jeh: cloudvirt2003-dev: downtime in icinga and reboot to enable BIOS virtualization support [[phab:T249453|T249453]]
* 14:38 ema: cp3052: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 14:35 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (8/14.5h) (duration: 00m 58s)
* 14:25 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (4/14.5h) (duration: 00m 58s)
* 14:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (2/14.5h) (duration: 00m 58s)
* 14:08 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) take 2 (duration: 00m 57s)
* 13:57 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: REVERT [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 58s)
* 13:55 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 29s)
* 13:17 vgutierrez: restart ats-tls on cp3056 - [[phab:T249335|T249335]]
* 12:59 vgutierrez: restart ats-tls on cp3052- [[phab:T249335|T249335]]
* 12:50 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-6.list > [[phab:T249596|T249596]]-6.out # [[phab:T249565|T249565]]
* 12:43 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-5.list > [[phab:T249596|T249596]]-5.out # [[phab:T249565|T249565]]
* 12:42 vgutierrez: restart ats-tls on cp3058 - [[phab:T249335|T249335]]
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:06 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-4.list > [[phab:T249596|T249596]]-4.out # [[phab:T249565|T249565]] [[phab:T249596|T249596]]
* 12:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'repool db1126', diff saved to https://phabricator.wikimedia.org/P10932 and previous config saved to /var/cache/conftool/dbconfig/20200407-115228-marostegui.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1126', diff saved to https://phabricator.wikimedia.org/P10931 and previous config saved to /var/cache/conftool/dbconfig/20200407-115154-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092, db1111, db1099:3318 after table rename', diff saved to https://phabricator.wikimedia.org/P10930 and previous config saved to /var/cache/conftool/dbconfig/20200407-115058-marostegui.json
* 11:50 jynus: renaming wb_items_per_site_recovered to wb_items_per_site on s8
* 11:45 jynus: stopping s8 replication on db1116:3318, db1095:3318, db2079
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092, db1111, db1099:3318 for table rename', diff saved to https://phabricator.wikimedia.org/P10929 and previous config saved to /var/cache/conftool/dbconfig/20200407-114258-marostegui.json
* 11:36 Amir1: stopped the rebuilt script ([[phab:T249565|T249565]])
* 11:34 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup [[phab:T203888|T203888]], Remove old unused RejectParserCacheValue hook (duration: 00m 59s)
* 11:09 marostegui: Deploy schema change on s3 codfw
* 11:07 jynus: starting recovery on all s8 hosts
* 10:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:41 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php: [[phab:T249565|T249565]] [[phab:T249596|T249596]] Wikibase rebuildItemsPerSite.php script that allows lists of ids (duration: 01m 00s)
* 10:27 jynus: starting recovery on db1099:3318
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P10927 and previous config saved to /var/cache/conftool/dbconfig/20200407-095852-marostegui.json
* 09:49 volans@deploy1001: Finished deploy [homer/deploy@887544c]: Release v0.2.0 (take 2) (duration: 00m 26s)
* 09:49 volans@deploy1001: Started deploy [homer/deploy@887544c]: Release v0.2.0 (take 2)
* 09:38 marostegui: Deploy schema change on db1119
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P10926 and previous config saved to /var/cache/conftool/dbconfig/20200407-093820-marostegui.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P10925 and previous config saved to /var/cache/conftool/dbconfig/20200407-093638-marostegui.json
* 09:31 volans@deploy1001: Finished deploy [homer/deploy@b4522ad]: Release v0.2.0 (duration: 00m 16s)
* 09:31 volans@deploy1001: Started deploy [homer/deploy@b4522ad]: Release v0.2.0
* 09:29 volans@deploy1001: Finished deploy [homer/deploy@ac7a818]: Inject plugins (take 3) (duration: 03m 03s)
* 09:26 volans@deploy1001: Started deploy [homer/deploy@ac7a818]: Inject plugins (take 3)
* 09:19 marostegui: Deploy schema change on db1134
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P10924 and previous config saved to /var/cache/conftool/dbconfig/20200407-091847-marostegui.json
* 09:17 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (take 2) (duration: 00m 29s)
* 09:17 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins (take 2)
* 09:04 vgutierrez: testing ATS 8.0.6-1wm6 on cp4026 and cp4032
* 08:58 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (duration: 04m 59s)
* 08:53 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins
* 08:46 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v4 uplinks - [[phab:T244147|T244147]]
* 08:44 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v6 uplinks - [[phab:T244147|T244147]]
* 08:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:37 mutante: decom ganeti VM miscweb1001 (stretch) - kept backup of old racktables files and db dump in /root/racktables on miscweb1002 ([[phab:T247648|T247648]])
* 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:30 mutante: decom ganeti VM miscweb2001 (stretch)
* 08:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P10923 and previous config saved to /var/cache/conftool/dbconfig/20200407-082607-marostegui.json
* 08:17 moritzm: installing php5 security updates
* 08:06 marostegui: Deploy schema change on db1106 (this will generate lag on s1 labs)
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change', diff saved to https://phabricator.wikimedia.org/P10922 and previous config saved to /var/cache/conftool/dbconfig/20200407-080533-marostegui.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P10921 and previous config saved to /var/cache/conftool/dbconfig/20200407-080443-marostegui.json
* 07:52 _joe_: disabling puppet on mwdebug1002
* 07:47 marostegui: Failover dbproxy1011 to dbproxy1019 - [[phab:T231520|T231520]])
* 07:43 marostegui: Deploy schema change on db1080
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P10920 and previous config saved to /var/cache/conftool/dbconfig/20200407-074321-marostegui.json
* 07:41 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]] (duration: 01m 28s)
* 07:40 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]]
* 07:39 _joe_: depooling wtp1025, used for debugging
* 07:31 vgutierrez: enable parent proxies in ats-tls - [[phab:T249335|T249335]]
* 07:19 jynus: restarting s3 on db1095
* 07:02 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 06:55 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:53 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 06:52 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:37 moritzm: installing ruby2.1 security updates
* 06:32 jynus: stopping slave (s3) on db1095
* 05:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]], take II (duration: 00m 58s)
* 05:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]] (duration: 01m 00s)
* 05:26 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/maintenance/: [[phab:T157651|T157651]] Remove sql.php from maintenance/ (duration: 00m 58s)
* 01:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/autoload.php: [[phab:T157651|T157651]] Remove sql.php from autoloader (duration: 00m 58s)
* 01:05 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: [[phab:T208425|T208425]] [[phab:T249565|T249565]] Follow-up {{Gerrit|a956c655}}: Only avoid dropping wb_items_per_site so prod can be merged (duration: 00m 58s)
* 00:01 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] cache bust (duration: 01m 01s)
 
== 2020-04-06 ==
* 23:59 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] (duration: 00m 59s)
* 23:31 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki-staging/php-1.35.0-wmf.26$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki
* 23:26 Amir1: created wb_items_per_site
* 19:05 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:03 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:00 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 18:58 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:57 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 18:51 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:42 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:22 Urbanecm: Morning SWAT done
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]; take II) (duration: 00m 58s)
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]) (duration: 00m 59s)
* 16:54 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:52 _joe_: parsoid migrated to use envoy for TLS termination
* 16:24 _joe_: switching parsoid-php to envoy for TLS termination
* 15:45 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Label blacklist updates ([[phab:T249285|T249285]]) (duration: 00m 58s)
* 15:36 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:04 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 14:59 addshore: deploy slot done
* 14:55 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:54 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (duration: 00m 57s)
* 14:50 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:49 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (duration: 00m 58s)
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10912 and previous config saved to /var/cache/conftool/dbconfig/20200406-144220-marostegui.json
* 14:41 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (cache bust) (duration: 00m 58s)
* 14:40 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10911 and previous config saved to /var/cache/conftool/dbconfig/20200406-143755-marostegui.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10910 and previous config saved to /var/cache/conftool/dbconfig/20200406-143042-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10909 and previous config saved to /var/cache/conftool/dbconfig/20200406-142607-marostegui.json
* 14:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (cachebust) (duration: 00m 58s)
* 14:23 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:09 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:07 elukey@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:07 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:47 sukhe: upload cescout 0.1.1-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 13:26 elukey: reboot stat1008 as test to verify ROCm 3.3 upgrades
* 13:22 elukey: stat1008 upgraded to ROCm 3.3 (enables Tensorflow 2.x)
* 13:05 ema: cache: upgrade varnish to 5.1.3-1wm13, begin rolling varnish-fe restarts [[phab:T249344|T249344]]
* 13:03 marostegui: Deploy schema change on db1118
* 13:03 jbond42: updating gnutls on buster
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P10906 and previous config saved to /var/cache/conftool/dbconfig/20200406-130320-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 after schema change', diff saved to https://phabricator.wikimedia.org/P10905 and previous config saved to /var/cache/conftool/dbconfig/20200406-130255-marostegui.json
* 12:59 Urbanecm: Creation of grwikimedia is done ([[phab:T245911|T245911]])
* 12:59 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:53 marostegui: Deploy schema change on db1107
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 for schema change', diff saved to https://phabricator.wikimedia.org/P10904 and previous config saved to /var/cache/conftool/dbconfig/20200406-125308-marostegui.json
* 12:52 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P10903 and previous config saved to /var/cache/conftool/dbconfig/20200406-125222-marostegui.json
* 12:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: {{Gerrit|77b9ae9}}: Create grwikimedia
* 12:44 urbanecm@deploy1001: Synchronized dblists/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 59s)
* 12:37 XioNoX: Update eqiad analytics filters with new APT IPs
* 12:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:21 marostegui: Deploy schema change on db1089
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P10902 and previous config saved to /var/cache/conftool/dbconfig/20200406-122123-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10901 and previous config saved to /var/cache/conftool/dbconfig/20200406-122058-marostegui.json
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:04 godog: test grafana 6.7.2 upgrade on grafana2001 - [[phab:T244208|T244208]]
* 11:57 awight: EU swat complete
* {{safesubst:SAL entry|1=11:53 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TwoColConflict: SWAT: [[gerrit:586309{{!}}Backport talk page and EventLogging changes (T248243, T249404) (duration: 00m 59s)}}
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 11:48 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 11:48 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:586325{{!}}Create account creator and rollback groups on yowiki (T249487)]] (duration: 00m 59s)
* 11:32 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/ContentTranslation: SWAT: [[gerrit:586311{{!}}Avoid failure on restoring draft with no categories (T249400)]] (duration: 01m 02s)
* 11:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: double-syncing (duration: 00m 58s)
* 11:24 marostegui: Deploy schema change on db1105:3311
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10900 and previous config saved to /var/cache/conftool/dbconfig/20200406-112417-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10899 and previous config saved to /var/cache/conftool/dbconfig/20200406-112123-marostegui.json
* 11:18 elukey: import AMD ROCm 3.3 packages in buster-wikimedia (component thirdparty/rocm33) - [[phab:T247082|T247082]]
* 11:17 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:580394{{!}}cirrus: Increase commonswiki near match weight (T245642)]] (duration: 00m 59s)
* 11:11 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:585779{{!}} Whitelist X-Wikimedia-Debug header for cross-wiki API requests (T249107)]] (duration: 00m 59s)
* 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 01m 12s)
* 09:50 XioNoX: push pfw firewall policies - [[phab:T249267|T249267]]
* 09:40 marostegui: Deploy schema change on db1099:3311
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10898 and previous config saved to /var/cache/conftool/dbconfig/20200406-093944-marostegui.json
* 09:11 ema: cp2027: upgrade varnish to 5.1.3-1wm13 and restart varnish-fe [[phab:T249344|T249344]]
* 09:08 ema: upload varnish 5.1.3-1wm13 to buster-wikimedia on apt1001.wm.org [[phab:T249344|T249344]]
* 08:55 ariel@deploy1001: Finished deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link (duration: 00m 09s)
* 08:55 ariel@deploy1001: Started deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link
* 08:54 elukey: bootstrap wdqs200[7,8] - [[phab:T246343|T246343]]
* 08:50 marostegui: Deploy schema change on db1139:3311
* 08:18 _joe_: conversion of codfw api done
* 08:07 marostegui: Deploy schema change on dbstore1003:3311
* 07:54 vgutierrez: rolling restart of ats-tls to disable wmf-analytics log - [[phab:T249335|T249335]] [[phab:T237993|T237993]]
* 07:50 dcausse: search index: deleting stale index wikidatawiki_content_1585224806 on cloudelastic:9243
* 07:49 _joe_: eqiad API migrated to envoy for local TLS termination, now starting codfw
* 07:35 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 as attempt to fix heavy GC runs (old gen) - [[phab:T231517|T231517]]
* 07:35 marostegui: Rename wb_terms on eqiad excluding labsdb1009, labdb1010, labsdb1011 - [[phab:T248086|T248086]]
* 07:06 marostegui: Rename wb_terms on codfw - [[phab:T248086|T248086]]
* 06:45 XioNoX: delete BGP to AS25074 in amsix
* 06:36 _joe_: converting the api servers to envoy for TLS in eqiad
* 06:30 marostegui: Upgrade dbproxy1019 - [[phab:T231520|T231520]]
* 06:18 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
* 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:50 vgutierrez: ats-tls restart in cp3056, cp3058 and cp3062 - [[phab:T249335|T249335]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P10897 and previous config saved to /var/cache/conftool/dbconfig/20200406-054559-marostegui.json
* 05:18 marostegui: Deploy schema change on db1079 (this will generate lag on s7 labs)
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P10896 and previous config saved to /var/cache/conftool/dbconfig/20200406-051744-marostegui.json
* 05:16 vgutierrez: Enable inbound TLSv1.3 in upload@eqiad - [[phab:T170567|T170567]]
* 05:16 vgutierrez: Enable TLS Session Tickets on eqiad - [[phab:T245616|T245616]]
* 05:03 vgutierrez: ats-tls restart in cp1075, cp1081 and cp1087 - [[phab:T249335|T249335]]
 
== 2020-04-03 ==
* 21:17 andrewbogott: ugpraded wikitech-static to 1.34.1
* 17:58 mutante: rsync home dirs from install1002 to apt1001:/srv/home_install1002...
* 15:43 ema: cp3061: restart varnish-fe [[phab:T249344|T249344]]
* 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:18 ema: cp3057: restart varnish-fe [[phab:T249344|T249344]]
* 14:37 hashar: Restarting Jenkins for a CSP parameter [[phab:T245658|T245658]]
* 14:07 vgutierrez: restart ats-tls on cp1087 - [[phab:T249335|T249335]]
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10882 and previous config saved to /var/cache/conftool/dbconfig/20200403-140132-marostegui.json
* 13:55 vgutierrez: restart ats-tls on cp1075 and cp1081 - [[phab:T249335|T249335]]
* 12:49 marostegui: Deploy schema change on db1090:3317
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10881 and previous config saved to /var/cache/conftool/dbconfig/20200403-124908-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P10880 and previous config saved to /var/cache/conftool/dbconfig/20200403-124827-marostegui.json
* 12:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]] (duration: 00m 43s)
* 12:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]]
* 12:27 marostegui: Deploy schema change on db1136
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P10879 and previous config saved to /var/cache/conftool/dbconfig/20200403-122716-marostegui.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P10878 and previous config saved to /var/cache/conftool/dbconfig/20200403-122259-marostegui.json
* 12:00 marostegui: Deploy schema change on db1094
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P10877 and previous config saved to /var/cache/conftool/dbconfig/20200403-115959-marostegui.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20200403-115854-marostegui.json
* 11:40 marostegui: Deploy schema change on db1098:3317
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10875 and previous config saved to /var/cache/conftool/dbconfig/20200403-114004-marostegui.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10874 and previous config saved to /var/cache/conftool/dbconfig/20200403-113717-marostegui.json
* 10:38 marostegui: Deploy schema change on db1101:3317
* 10:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|861b267}}: Enable cswiki anniversary logo ([[phab:T249173|T249173]]) (duration: 01m 02s)
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10872 and previous config saved to /var/cache/conftool/dbconfig/20200403-103746-marostegui.json
* 09:32 marostegui: Deploy schema on db1116:3317
* 08:43 marostegui: Deploy schema change on dbstore1003:3317
* 07:57 marostegui: Deploy schema change on s7 codfw master, this will generate lag on codfw
* 06:55 XioNoX: add fastnetmon 1.1.4 to buster-wikimedia - [[phab:T240658|T240658]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P10870 and previous config saved to /var/cache/conftool/dbconfig/20200403-062529-marostegui.json
* 05:21 marostegui: Deploy schema change on db1126
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P10869 and previous config saved to /var/cache/conftool/dbconfig/20200403-052115-marostegui.json
* 00:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null ([[phab:T249277|T249277]]) (duration: 01m 00s)
 
== 2020-04-02 ==
* 23:53 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 23:09 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) ([[phab:T248282|T248282]]) (duration: 00m 58s)
* 19:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: [[phab:T249258|T249258]]: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s)
* 19:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]] (duration: 15m 13s)
* 19:35 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/MovePage.php: [[phab:T248789|T248789]] MovePage: Use correct Title when creating the null revision (duration: 00m 59s)
* 19:30 hashar: docker-pkg update on contint hosts
* 19:30 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s)
* 19:29 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 19:23 ppchelko@deploy1001: Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]]
* 19:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 19:00 longma: promoting all to 1.35.0-wmf.26
* 18:39 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]] (duration: 01m 05s)
* 18:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 18:37 longma: rolling group1 to 1.35.0-wmf.26
* 18:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: {{Gerrit|4e2a092}}: EditorGateway: Fix handling of null sectionId ([[phab:T249169|T249169]]) (duration: 01m 09s)
* 18:22 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: {{Gerrit|94ded03}}: Fix issues with treating section "numbers" as integers ([[phab:T248795|T248795]]; [[phab:T248968|T248968]]; [[phab:T249112|T249112]]) (duration: 01m 10s)
* 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}} (duration: 03m 21s)
* 17:45 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}}
* 16:53 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s)
* 16:53 joal@deploy1001: Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8]
* 16:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: [[phab:T249162|T249162]] Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)
* 16:44 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s)
* 16:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 16:33 jforrester@deploy1001: sync-file aborted: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s)
* 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s)
* 16:30 joal@deploy1001: Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8]
* 16:19 XioNoX: upgrade netflow4001's fastnetmon to 1.1.4 - [[phab:T240658|T240658]]
* 14:56 XioNoX: push new test switch config for cloudvirt2001 - [[phab:T248425|T248425]]
* 14:33 vgutierrez: Enable inbound TLSv1.3 in upload@codfw - [[phab:T170567|T170567]]
* 14:33 vgutierrez: Enable TLS Session tickets in codfw - [[phab:T245616|T245616]]
* 14:24 jbond42: updating bluez on ganeti and cloudvirt
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json
* 13:50 marostegui: Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - [[phab:T248967|T248967]]
* 13:44 vgutierrez: update puppet compiler facts
* 13:40 marostegui: Deploy schema change on db1111
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json
* 13:32 gehel: OSM data reimport on maps2004 - [[phab:T249086|T249086]]
* 12:55 mutante: mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active'
* 12:52 mutante: mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active"
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json
* 11:06 mutante: decom planet1001 ([[phab:T248863|T248863]])
* 10:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:19 marostegui: Deploy schema change on db1087, this will generate lag on s8 on wiki replicas
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json
* 10:17 elukey: set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json
* 09:47 marostegui: Remove haproxy@10.64.37.14 from labsdb hosts - [[phab:T231280|T231280]] [[phab:T248944|T248944]]
* 09:44 gehel: CORRECTION: depool maps2004 for data reimport - [[phab:T249086|T249086]]
* 09:40 gehel: depool wdqs2004 for data reimport - [[phab:T249086|T249086]]
* 09:33 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s)
* 09:32 oblivian@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 09:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s)
* 09:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided)
* 08:51 marostegui: Deploy schema change db1104
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json
* 08:28 gehel: repooling wdqs1006 - catched up on lag
* 08:22 vgutierrez: Enable inbound TLSv1.3 in upload@esams - [[phab:T170567|T170567]]
* 08:21 vgutierrez: Enable TLS Session tickets in esams - [[phab:T245616|T245616]]
* 07:45 moritzm: bounced ferm on ms-be1040
* 07:27 marostegui: Deploy schema change on db1092
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json
* 05:49 marostegui: Deploy schema change on db1101:3318
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json
* 05:29 elukey: powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console)
 
== 2020-04-01 ==
* 22:44 volker-e@deploy1001: Finished deploy [design/style-guide@4bfe647]: Deploy design/style-guide:  (duration: 00m 08s)
* 22:43 volker-e@deploy1001: Started deploy [design/style-guide@4bfe647]: Deploy design/style-guide:
* 22:02 volans: forcing logrotate on netflow2001 to compress yesterday's logs
* 21:53 volans: force-rebooting ms-be1023, unresponsive - [[phab:T249174|T249174]]
* 21:50 volans: stopped and restarted kafkatee-webrequest.service on netflow2001, was in a restart loop
* 19:48 marxarelli: rollback of 1.35.0-wmf.26 from group1 ([[phab:T247773|T247773]]). blocked by [[phab:T249162|T249162]]
* 19:30 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback 1.35.0-wmf.26 from group1
* 19:21 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26 (duration: 01m 06s)
* 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26
* 19:18 marxarelli: promoting group1 to 1.35.0-wmf.26 to group1
* 17:21 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqord*' commit 'enable sampling on eqord Iac15379cc'
* 16:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqdfw*' commit 'enable sampling on eqdfw Iac15379cc'
* 16:39 vgutierrez: pool cp2027 - [[phab:T248816|T248816]]
* 16:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:17 ariel@deploy1001: Finished deploy [dumps/dumps@21363c1]: page range prefetch fixup (duration: 00m 09s)
* 16:17 ariel@deploy1001: Started deploy [dumps/dumps@21363c1]: page range prefetch fixup
* 15:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:31 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:27 vgutierrez: depool & decommission cp20[16,19,23,27] - [[phab:T249125|T249125]]
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10845 and previous config saved to /var/cache/conftool/dbconfig/20200401-152258-marostegui.json
* 15:11 herron: performing kafka-main rolling restarts to pick up security updates
* 14:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 vgutierrez: depool && decommission cp[2018,2020,2022,2024-2026].codfw.wmnet - [[phab:T249115|T249115]]
* 14:32 gehel: depooling wdqs1006 to allow catching up on lag
* 14:30 vgutierrez: pool cp2042 - [[phab:T248816|T248816]]
* 14:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 XioNoX: remove AS-path prepending in esams
* 13:47 XioNoX: remove AS-path prepending in eqsin
* 13:39 vgutierrez: pool cp2041 - [[phab:T248816|T248816]]
* 13:34 mutante: sodium (mirror): sudo -u mirror ftpsync to get Debian mirror updated (Icinga says it's old)
* 13:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 marostegui: Deploy schema change on db1099:3318
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10843 and previous config saved to /var/cache/conftool/dbconfig/20200401-131719-marostegui.json
* 13:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:19 tgr@deploy1001: Synchronized wmf-config/config: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 06s)
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 tgr@deploy1001: Synchronized dblists/growthexperiments.dblist: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 05s)
* 12:17 XioNoX: restart nfacct on netflow4001 for kafka tls tests - [[phab:T248980|T248980]]
* 12:15 vgutierrez: depool & decommission cp2013 - [[phab:T249088|T249088]]
* 12:14 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 06s)
* 12:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585059{{!}}Enable password-reset-update on all other than Wikipedias (T245791)]] (duration: 01m 07s)
* 12:09 marostegui: Deploy schema change on db1116:3318
* 12:05 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons take 2 (duration: 01m 08s)
* 12:04 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons (duration: 01m 05s)
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]; take II) (duration: 01m 05s)
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]) (duration: 01m 07s)
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php:  [SDC] Enable WikibaseQualityConstraints on Commons take II (duration: 01m 06s)
* 11:44 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons (duration: 01m 18s)
* 11:20 cormacparle__: created table wbqc_constraints on commonswiki
* 11:03 jbond42: install bluez update on ganeti-canary and cloudvirt/cloudcontrol-dev
* 11:01 mutante: planet1001 - reinstall OS to test install_server switch, ATS switched to planet1002 earlier
* 10:47 marostegui: Deploy schema change on dbstore1005:3318
* 10:25 vgutierrez: pool cp2040 - [[phab:T248816|T248816]]
* 10:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=canary
* 09:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:37 marostegui: Deploy schema change on s8 codfw, this will generate lag on codfw
* 09:35 XioNoX: Update install servers IPs (dhcp helpers + firewall rules) - [[phab:T224576|T224576]]
* 09:34 mutante: install_servers: DHCP_relay in routers and TFTP server in DHCP server config have been switched from install1002/2002 to install1003/2003 - doing a test install, but if any issues report on [[phab:T224576|T224576]]
* 09:26 marostegui: last entry was for db2093
* 09:26 marostegui: Downgrade mariadb package from 10.4.12-2 to 10.4.12-1
* 09:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:05 mutante: planet - the backend server has been switched from planet1001 (stretch) to planet1002 (buster) - [[phab:T247651|T247651]]
* 08:46 mutante: deneb, boron: systemctl reset-failed to clear up systemd state alerts
* 08:43 marostegui: Stop haproxy on dbproxy1010 [[phab:T248944|T248944]]
* 08:37 jynus: restart bacula at backup1001
* 08:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:28 vgutierrez: depool & decommission cp2017 - [[phab:T249084|T249084]]
* 08:21 vgutierrez: pool cp2039 - [[phab:T248816|T248816]]
* 08:09 marostegui: Deploy schema change on db1138 (s4 primary master)
* 08:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 after schema change', diff saved to https://phabricator.wikimedia.org/P10841 and previous config saved to /var/cache/conftool/dbconfig/20200401-071339-marostegui.json
* 07:12 vgutierrez: pool cp2038 - [[phab:T248816|T248816]]
* 06:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 06:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 vgutierrez: depool & decommission cp2012 - [[phab:T249080|T249080]]
* 06:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui: Deploy schema change on db1121 (this will create lag on s4 labs)
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P10840 and previous config saved to /var/cache/conftool/dbconfig/20200401-053827-marostegui.json
* 00:39 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Update http and prot rel links to https, fix link to sitelist in MW Core (duration: 01m 06s)
* 00:12 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Add export-0.11 (duration: 01m 05s)
 
== 2020-03-31 ==
* 22:23 marxarelli: group0 to 1.35.0-wmf.26 ([[phab:T247773|T247773]]); no rise in error rates following redeployment
* 22:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.26
* 22:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 21:54 dduvall@deploy1001: sync aborted: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]]) (duration: 07m 31s)
* 21:47 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 21:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/user/UserNameUtils.php: [[phab:T249045|T249045]] Use wfMessage in UserNameUtils::isUsable for now (duration: 00m 58s)
* 21:05 eileen: process-control config revision is {{Gerrit|f80d248113}} - (catch up dedupe now off - fyi MBeat )
* 20:59 hashar: contint1001: manually reverted /lib/systemd/system/jenkins.service
* 20:51 hashar: Restarting Jenkins for new CSP rules # [[phab:T245658|T245658]]
* 20:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rolling back 1.35.0-wmf.26 testwiki deployment following significant increase in error rate (cc [[phab:T247773|T247773]])
* 20:14 marxarelli: correction: RequestContext::getLanguage errors are for testwiki deployment, pre group0
* 20:08 marxarelli: a slew of "ErrorException from line 334 of /srv/mediawiki/php-1.35.0-wmf.26/includes/context/RequestContext.php: PHP Warning: Recursion detected in RequestContext::getLanguage" after group0 deployment (cc [[phab:T247773|T247773]])
* 20:04 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache (duration: 142m 48s)
* 19:20 ariel@deploy1001: Finished deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly (duration: 00m 04s)
* 19:20 ariel@deploy1001: Started deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly
* 18:08 ariel@deploy1001: Finished deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date (duration: 00m 05s)
* 18:07 ariel@deploy1001: Started deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date
* 17:42 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache
* 17:40 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.23 (duration: 26m 51s)
* 17:38 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad.service on cloudelastic1001 to see if it recovers from a trashing/gc state - [[phab:T231517|T231517]]
* 16:30 marxarelli: 1.35.0-wmf.26 was branched at {{Gerrit|bec758b668aaa57fc259a1d0ecf3b35340d2661b}} for [[phab:T247773|T247773]]
* 16:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 16:15 vgutierrez: pool cp2037 - [[phab:T248816|T248816]]
* 15:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:35 mutante: decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) [[phab:T247780|T247780]]
* 15:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 vgutierrez: depool & decommission cp2010 - [[phab:T249002|T249002]]
* 15:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245794|T245794]] Enable DiscussionTools as a beta feature on four wikis (duration: 01m 00s)
* 15:05 cdanis: cr1-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 15:01 cdanis: cr2-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 14:43 vgutierrez: pool cp2036 - [[phab:T248816|T248816]]
* 14:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[4-8].eqiad.wmnet
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091 after schema change', diff saved to https://phabricator.wikimedia.org/P10834 and previous config saved to /var/cache/conftool/dbconfig/20200331-141459-marostegui.json
* 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 13:31 vgutierrez: Enable TLS Session tickets in eqsin - [[phab:T245616|T245616]]
* 13:05 XioNoX: update nat on pfw3-codfw - [[phab:T248906|T248906]]
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:49 _joe_: switching all appserver canaries to envoy
* 12:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:45 marostegui: Deploy schema change on db1091
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for schema change', diff saved to https://phabricator.wikimedia.org/P10833 and previous config saved to /var/cache/conftool/dbconfig/20200331-124452-marostegui.json
* 12:34 _joe_: transitioning mw1261 to envoy
* 12:23 vgutierrez: rolling upgrade of ATS to version 8.0.6-1wm5 - [[phab:T248938|T248938]]
* 11:30 Lucas_WMDE: EU SWAT done
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]], take II (duration: 00m 57s)
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]] (duration: 00m 58s)
* 11:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]], take II (duration: 00m 59s)
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]] (duration: 01m 00s)
* 10:46 _joe_: disabled puppet on canary appservers, potentially dangerous change ahead
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084 after schema change', diff saved to https://phabricator.wikimedia.org/P10831 and previous config saved to /var/cache/conftool/dbconfig/20200331-101953-marostegui.json
* 10:03 XioNoX: add BGP to AS41327 in AMS-IX
* 09:49 XioNoX: push homer diffs to mr1-eqsin
* 09:36 XioNoX: push homer diffs to mr1-eqiad
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 09:05 vgutierrez: upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - [[phab:T248938|T248938]]
* 09:00 vgutierrez: depool & decommission cp2011 - [[phab:T248950|T248950]]
* 08:44 vgutierrez: pool cp2035 - [[phab:T248816|T248816]]
* 08:31 mutante: signed puppet cert for planet1002.eqiad.wmnet
* 08:29 marostegui: Depool db1084 for schema change
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for schema change', diff saved to https://phabricator.wikimedia.org/P10829 and previous config saved to /var/cache/conftool/dbconfig/20200331-082904-marostegui.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change', diff saved to https://phabricator.wikimedia.org/P10828 and previous config saved to /var/cache/conftool/dbconfig/20200331-082711-marostegui.json
* 08:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:01 XioNoX: delete unused ROA for ARIN v4 prefixes - [[phab:T235886|T235886]]
* 07:49 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:17 vgutierrez: pool cp2034 - [[phab:T248816|T248816]]
* 07:16 marostegui: Deploy schema change on db1081
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for schema change', diff saved to https://phabricator.wikimedia.org/P10827 and previous config saved to /var/cache/conftool/dbconfig/20200331-071547-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10826 and previous config saved to /var/cache/conftool/dbconfig/20200331-071401-marostegui.json
* 06:48 marostegui: Deploy schema change on db1103:3314
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10825 and previous config saved to /var/cache/conftool/dbconfig/20200331-064707-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10824 and previous config saved to /var/cache/conftool/dbconfig/20200331-064627-marostegui.json
* 05:55 marostegui: Drop nova and nova_api from m5 master (db1133) - [[phab:T248313|T248313]]
* 05:55 kart_: Updated cxserver to 2020-03-30-145349-production ([[phab:T248578|T248578]])
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 05:53 vgutierrez: depool && decommission cp2007 - [[phab:T248941|T248941]]
* 05:48 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:46 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:46 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:26 marostegui: Deploy schema change on db1097:3314
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10822 and previous config saved to /var/cache/conftool/dbconfig/20200331-051354-marostegui.json
* 00:26 eileen: civicrm revision changed from {{Gerrit|cf2e2c11c3}} to {{Gerrit|524b162174}}, config revision is {{Gerrit|708198a154}}
 
== 2020-03-30 ==
* 23:30 cdanis: cr3-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Alphabetize wikis in each GrowthExperiments settings (duration: 00m 58s)
* 23:16 cdanis: cr2-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:08 cdanis: cdanis@cr3-knams# commit comment "sensible flow table sizes [[phab:T248394|T248394]]"
* 22:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Provide wmgSiteLogoIcon (duration: 00m 57s)
* 22:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgSiteLogoIcon for each project family and four special wikis (duration: 00m 58s)
* 22:50 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Set wgMobileFrontendLogo from wgLogos['icon'] if set (duration: 00m 59s)
* 22:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 57s)
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split wgLogos setting into wmgSiteLogo1x etc. (duration: 00m 59s)
* 22:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Construct wgLogos in CommonSettings so that projects can inherit values (duration: 01m 02s)
* 19:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 ejegg: updated payments listener (standalone SmashPig) from {{Gerrit|dc0c6b208b}} to {{Gerrit|d80e4c5abd}}
* 15:32 vgutierrez: pool cp2033 - [[phab:T248816|T248816]]
* 15:25 jeh: add icinga 2h downtime and soft reset iDRAC on labstore1005.mgmt.eqiad.wmnet [[phab:T247965|T247965]]
* 14:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 vgutierrez: depool & decommission cp2008 - [[phab:T248864|T248864]]
* 14:23 vgutierrez: pool cp2032 - [[phab:T248816|T248816]]
* 14:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:01 vgutierrez: depool & decommission cp2006 - [[phab:T248856|T248856]]
* 13:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 vgutierrez: pool cp2031 - [[phab:T248816|T248816]]
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:53 vgutierrez: depool & decommission cp2005 - [[phab:T248848|T248848]]
* 12:26 cdanis: cdanis@re0.cr2-codfw# set chassis fpc 5 inline-services flex-flow-sizing    cdanis@re0.cr2-codfw# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 12:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:21 vgutierrez: depool & decommission cp2004 - [[phab:T248824|T248824]]
* 12:03 XioNoX: delete unused ROA for ARIN v6 prefixes - [[phab:T235886|T235886]]
* 11:59 XioNoX: delete unused ROAs for RIPE prefixes - [[phab:T235886|T235886]]
* 11:42 mutante: miscweb2002 - race condition with apache2 mpm and php7.3 module met - a2dismond mpm_event ; systemctl restart apache2 ; puppet agent -tv (also see [[phab:T196968|T196968]], https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [[phab:T247887|T247887]]
* 11:37 mutante: miscweb2002 - installed OS, added to puppet, added role and  ... sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config ([[phab:T247648|T247648]])
* 11:30 marostegui: Deploy schema change on dbstore1004:3314
* 11:22 XioNoX: delete ARIN allocations from RIPE's IRR - [[phab:T235886|T235886]]
* 11:11 Urbanecm: EU SWAT done
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]; take II) (duration: 00m 58s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]) (duration: 00m 58s)
* 11:08 vgutierrez: pool cp2030 - [[phab:T248816|T248816]]
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]; take II) (duration: 00m 59s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]) (duration: 00m 59s)
* 10:43 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:34 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 10:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:59 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata JSON dumps start at 9:59 UTC today ([[phab:T248612|T248612]])
* 09:56 hoo@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/Wikibase/repo/maintenance/DumpEntities.php: DumpEntities: Fix DB group default override ([[phab:T248612|T248612]]) (duration: 01m 02s)
* 09:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 vgutierrez: pool cp2029 - [[phab:T248816|T248816]]
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 vgutierrez: depool & decommission cp2002 - [[phab:T248818|T248818]]
* 07:48 marostegui: Run cloudcontrol1003:~# wmcs-wikireplica-dns to promote dbproxy1018 to wikireplicas active proxy [[phab:T231520|T231520]]
* 07:40 marostegui: Replace dbproxy1010 with dbproxy1011 for wiki replicas, analytics - [[phab:T231520|T231520]]
* 07:28 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T248333|T248333]]
* 07:26 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T248333|T248333]]
* 07:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 07:10 vgutierrez: depool and decommission cp2001 - [[phab:T248815|T248815]]
* 06:52 vgutierrez: pool cp2028 - [[phab:T247340|T247340]]
* 06:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10813 and previous config saved to /var/cache/conftool/dbconfig/20200330-062858-marostegui.json
* 06:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui: Deploy schema change on db1074 with replication, this will generate lag on s2 labs
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P10812 and previous config saved to /var/cache/conftool/dbconfig/20200330-060338-marostegui.json
* 05:40 vgutierrez: pool cp2027 - [[phab:T247340|T247340]]
* 05:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 vgutierrez: Enable TLS Session tickets in ulsfo - [[phab:T245616|T245616]]
* 04:32 vgutierrez: upgrade ATS to version 8.0.6-1wm4 on ulsfo - [[phab:T245616|T245616]]
 
== 2020-03-29 ==
* 08:24 elukey: powercycle elastic1059 - mgmt/serial console stuck, no ssh - racadm getsel shows a lot of OEM errors occurred, nothing specific
 
== 2020-03-28 ==
* 16:54 elukey: restart yarn on analytics1071
* 12:05 vgutierrez: preemptive restart of ats-tls on cp1081 and cp3062 - [[phab:T248736|T248736]]
* 11:32 vgutierrez: restart ats-tls on cp1077 - [[phab:T248736|T248736]]
* 08:34 vgutierrez: pool cp1089
* 08:30 vgutierrez: restarting ats-tls on cp1089
 
== 2020-03-27 ==
* 20:51 ejegg: updated payments-wiki from {{Gerrit|db618f429d}} to {{Gerrit|1640f5e21e}}
* 15:15 andrew@deploy1001: Finished deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens (duration: 03m 35s)
* 15:12 andrew@deploy1001: Started deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P10807 and previous config saved to /var/cache/conftool/dbconfig/20200327-144125-marostegui.json
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P10806 and previous config saved to /var/cache/conftool/dbconfig/20200327-142240-marostegui.json
* 14:19 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10805 and previous config saved to /var/cache/conftool/dbconfig/20200327-133022-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P10804 and previous config saved to /var/cache/conftool/dbconfig/20200327-130706-marostegui.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10803 and previous config saved to /var/cache/conftool/dbconfig/20200327-130542-marostegui.json
* 12:49 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --interface-admin
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10802 and previous config saved to /var/cache/conftool/dbconfig/20200327-122144-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10801 and previous config saved to /var/cache/conftool/dbconfig/20200327-122058-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10800 and previous config saved to /var/cache/conftool/dbconfig/20200327-120234-marostegui.json
* 11:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase202[123].codfw.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[123].codfw.wmnet
* 11:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2023.codfw.wmnet
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2022.codfw.wmnet
* 11:44 oblivian@puppetmaster1001: conftool action : edit; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[1].codfw.wmnet
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2021.codfw.wmnet
* 10:55 mutante: revoke puppet cert webserver-misc-apps.discovery.wmnet and recreate with additional SANs for new VMs
* 10:45 mutante: miscweb1002 - upload and unpack RackTables-0.21.4 ([[phab:T247646|T247646]] [[phab:T247648|T247648]])
* 10:28 marostegui: Alter db2125 s2 to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 10:12 mutante: miscweb1002 - sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config  [[phab:T247648|T247648]]
* 10:04 vgutierrez: upload trafficserver 8.0.6-1wm4 to apt.wm.o (buster) - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 10:03 mutante: sodium - find /srv/mirrors/debian/ -user root -exec chown -h mirror:mirror <nowiki>{</nowiki><nowiki>}</nowiki> \;  (-h to also fix symbolic links); sudo -u mirror ftpsync ([[phab:T248660|T248660]])
* 10:02 marostegui: Alter db2084:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 10:01 marostegui: Alter db1096:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 09:37 mutante: sodium - running ftpsync as user mirror ([[phab:T248660|T248660]])
* 09:36 mutante: sodium fixing root owned files in /srv/mirrors/debian to be owned by mirror:mirror ([[phab:T248660|T248660]])
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P10799 and previous config saved to /var/cache/conftool/dbconfig/20200327-093214-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10798 and previous config saved to /var/cache/conftool/dbconfig/20200327-093106-marostegui.json
* 07:58 marostegui: Deploy schema change on s2 codfw - this will generate lag on s2 codfw - [[phab:T248333|T248333]]
* 07:36 elukey: execute 'rm /etc/logrotate.d/ceph-common' on cloudvirt[1,2]* and cloudcontrol* to stop daily cronspam (file not in the puppet catalog anymore)
* 07:32 moritzm: installing grub2 updates from Stretch point release
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P10796 and previous config saved to /var/cache/conftool/dbconfig/20200327-072334-marostegui.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P10795 and previous config saved to /var/cache/conftool/dbconfig/20200327-070224-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P10794 and previous config saved to /var/cache/conftool/dbconfig/20200327-070014-marostegui.json
* 06:31 marostegui: Deploy schema change on db1082, this will generate lag on s5 labs
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P10793 and previous config saved to /var/cache/conftool/dbconfig/20200327-063042-marostegui.json
 
== 2020-03-26 ==
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]; take II) (duration: 00m 57s)
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]) (duration: 00m 58s)
* 22:51 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/user/UserRightsProxy.php: {{Gerrit|I9121f5aae}} (4/4) (duration: 00m 58s)
* 22:50 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/search/SearchMySQL.php: {{Gerrit|I9121f5aae}} (3/4) (duration: 00m 58s)
* 22:48 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/objectcache/SqlBagOStuff.php: {{Gerrit|I9121f5aae}} (2/4) (duration: 00m 58s)
* 22:44 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/jobqueue/jobs/RecentChangesUpdateJob.php: {{Gerrit|I9121f5aae}} (1/4) (duration: 01m 00s)
* 22:05 ejegg: updated fundraising CiviCRM from {{Gerrit|f1cb23e809}} to {{Gerrit|cf2e2c11c3}}
* 21:43 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/MachineVision: Fix: Stop sorting label suggestions by Wikidata ID in ApiQueryImageLabels (duration: 01m 00s)
* 21:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:32 cdanis: cdanis@re0.cr1-eqsin# set chassis afeb slot 0 inline-services flex-flow-sizing    cdanis@re0.cr1-eqsin# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 21:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:27 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}} (duration: 03m 07s)
* 21:24 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}}
* 21:15 cdanis: repool ulsfo
* 21:12 cdanis: applied flow-table-size configuration to cr4-ulsfo which did not need a reboot to apply it [[phab:T248394|T248394]]
* 20:51 cdanis: cdanis@cr3-ulsfo> request system reboot
* 20:36 cdanis: depool ulsfo
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:34 XioNoX: stop exchanging full BGP view between eqiad and codfw - [[phab:T246721|T246721]]
* 16:19 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 XioNoX: stop advertising 208.80.152.0/22 from eqiad - [[phab:T246721|T246721]]
* 16:15 mutante: signing puppet cert for miscweb1002, installed buster, added insetup role ([[phab:T247887|T247887]])
* 16:15 ebernhardson: set cloudelastic-chi wikidatawiki_content to 0 replicas while reindexing
* 16:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 moritzm: rebooting mw2150 for some tests
* 16:12 XioNoX: stop advertising 2620:0:860::/46 from eqiad - [[phab:T246721|T246721]]
* 16:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:58 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:53 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:51 moritzm: installing grub2 updates from Stretch point release
* 15:49 XioNoX: start advertising 208.80.154.0/23 from eqiad - [[phab:T246721|T246721]]
* 15:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:40 XioNoX: start advertising 2620:0:861::/48 from eqiad - [[phab:T246721|T246721]]
* 15:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 mutante: [[phab:T247887|T247887]] - create Ganeti VM miscweb1002.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row C with 1 vCPUs, 2GB of RAM, 20GB of disk in the private network.
* 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P10787 and previous config saved to /var/cache/conftool/dbconfig/20200326-135625-marostegui.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P10786 and previous config saved to /var/cache/conftool/dbconfig/20200326-132940-marostegui.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10785 and previous config saved to /var/cache/conftool/dbconfig/20200326-130122-marostegui.json
* 12:57 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-main to use envoy [[phab:T244843|T244843]] (duration: 01m 07s)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10784 and previous config saved to /var/cache/conftool/dbconfig/20200326-123302-marostegui.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10783 and previous config saved to /var/cache/conftool/dbconfig/20200326-123157-marostegui.json
* 12:25 mutante: analytics1028 - performing a puppet change on every run (all other hosts doing this were fixed just recently)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10782 and previous config saved to /var/cache/conftool/dbconfig/20200326-121859-marostegui.json
* 11:38 awight: EU SWAT done
* 11:37 awight@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/TwoColConflict: SWAT: [[gerrit:583576{{!}}Two hotfixes for guided tour (T248465)]] (duration: 01m 07s)
* 11:25 mutante: sodium - running ftpsync to get Debian mirror in sync
* 11:23 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 05s)
* 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 06s)
* 11:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/ContentTranslation/modules/ui/mw.cx.ui.Categories.js: SWAT: {{Gerrit|1ea6bad}}: Allow publishing to continue even with broken categories ([[phab:T248302|T248302]]) (duration: 01m 07s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|d1bb0b1}}: Removed expired throttle.php entries (duration: 01m 09s)
* 11:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:58 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:16 XioNoX: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 - [[phab:T207753|T207753]]
* 09:50 elukey: reboot stat1008 - gpu + drivers in a weird state after multiple tests
* 09:00 XioNoX: push v4 conditional advertising on cr3-knams - [[phab:T236785|T236785]]
* 08:44 marostegui: Deploy schema change on s5 codfw, lag will show up on codfw - [[phab:T248333|T248333]]
* 08:27 XioNoX: troubleshot v6 conditional advertisement from cr3-knams - [[phab:T236785|T236785]]
* 07:58 XioNoX: remove BGP session to AS8001 in eqiad (down and not replying to email)
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P10781 and previous config saved to /var/cache/conftool/dbconfig/20200326-074033-marostegui.json
* 07:31 marostegui: Deploy schema change on db1085, lag will appear on s6 on labs
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P10780 and previous config saved to /var/cache/conftool/dbconfig/20200326-073048-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P10779 and previous config saved to /var/cache/conftool/dbconfig/20200326-070746-marostegui.json
* 06:59 marostegui: Deploy schema change on db1093
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P10778 and previous config saved to /var/cache/conftool/dbconfig/20200326-065929-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P10777 and previous config saved to /var/cache/conftool/dbconfig/20200326-065814-marostegui.json
* 06:48 marostegui: Deploy schema change on db1088
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P10776 and previous config saved to /var/cache/conftool/dbconfig/20200326-064748-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10775 and previous config saved to /var/cache/conftool/dbconfig/20200326-064648-marostegui.json
* 06:39 marostegui: Deploy schema change on db1098:3316
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10774 and previous config saved to /var/cache/conftool/dbconfig/20200326-063844-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10773 and previous config saved to /var/cache/conftool/dbconfig/20200326-063633-marostegui.json
* 06:26 marostegui: Deploy schema change on db1096:3316
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10772 and previous config saved to /var/cache/conftool/dbconfig/20200326-062631-marostegui.json
* 06:22 marostegui: Rename nova and nova_api tables on db1117:3325 - [[phab:T248313|T248313]]
* 00:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on testwiki ([[phab:T247645|T247645]]) (duration: 03m 14s)
 
== 2020-03-25 ==
* 23:49 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add investigate to $wgAvailableRights ([[phab:T247645|T247645]]) (duration: 03m 16s)
* 23:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Retry because mw1251 timed out, and it is a proxy (duration: 03m 15s)
* 23:38 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Add new investigate right ([[phab:T247645|T247645]]) (duration: 03m 17s)
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:05 rlazarus: updating eventgate-logging-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints (duration: 02m 57s)
* 21:56 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:46 urandom: dropping unused Cassandra keyspaces -- [[phab:T248018|T248018]]
* 21:45 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:44 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:44 rlazarus: updating eventgate-analytics-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:16 rlazarus: holding off on updating eventgate-analytics until EU time, to check on unexpected helmfile diffs [[phab:T246868|T246868]]
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:07 rlazarus: updating eventgate-analytics to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:36 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 20:32 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 20:22 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 20:22 rlazarus: updating cxserver to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:19 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 20:19 rlazarus: updating citoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:36 hasharDinner: Jenkins restarted on all machines
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:29 rlazarus: updating eventstreams to envoy 1.13.1 [[phab:T246868|T246868]]
* 19:28 twentyafterfour: group1 looks good after deploying wmf.25 refs [[phab:T233873|T233873]]
* 19:27 hashar: upgrading Jenkins # [[phab:T248122|T248122]]
* 19:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.25  refs [[phab:T233873|T233873]]
* 19:26 twentyafterfour: scap sync-proxies failed on mw1251
* 18:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]] (duration: 14m 00s)
* 18:39 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]]
* 18:39 ppchelko@deploy1001: Finished deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints (duration: 14m 28s)
* 18:24 ppchelko@deploy1001: Started deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints
* 18:21 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: re-sync, mw1251 failed (duration: 03m 18s)
* 18:13 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: SWAT: [[gerrit:583393{{!}}Mentorship module: Update for root screen refactor (T248422)]] (duration: 03m 23s)
* 18:06 ppchelko@deploy1001: Finished deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints (duration: 01m 40s)
* 18:05 ppchelko@deploy1001: Started deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints
* 17:43 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:38 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:33 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 16:50 moritzm: installing python-bleach security updates
* 16:47 moritzm: updated jenkins packages on apt.wikimedia.org to 2.222.1
* 16:33 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:32 sukhe: upload cescout 0.1.0-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 16:17 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:15 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 16:07 rlazarus: updating blubberoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2115 after reimage to Buster', diff saved to https://phabricator.wikimedia.org/P10767 and previous config saved to /var/cache/conftool/dbconfig/20200325-152148-marostegui.json
* 15:14 moritzm: installing deneb.codfw.wmnet [[phab:T248165|T248165]]
* 14:51 cdanis: repool codfw [[phab:T248394|T248394]]
* 14:46 mutante: closed port 80 for caching servers on misc backends https://gerrit.wikimedia.org/r/q/topic:%22applayer-tls%22+(status:open%20OR%20status:merged) as final step per service on [[phab:T210411|T210411]]
* 14:39 mutante: static microsites (annual.wikimedia.org, research.wikimedia.org, static-bugzilla etc). closed port 80 for caching servers, finalizing switch to https behind caching servers
* 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:48 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:26 _joe_: cumin A:puppetmaster 'apt-get -y install puppet-common'
* 13:03 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 marostegui: Deploy schema change on db1139:3316
* 12:45 marostegui: Stop MySQL on db2115 for reimage to buster
* 11:50 cdanis: cr1-codfw: `set chassis fpc 5 inline-services flex-flow-sizing` and `request chassis fpc restart slot 5` [[phab:T248394|T248394]]
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 for upgrade', diff saved to https://phabricator.wikimedia.org/P10763 and previous config saved to /var/cache/conftool/dbconfig/20200325-114655-marostegui.json
* 11:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:37 mutante: decom mw1250 - mw1253
* 11:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:35 cdanis: depool codfw for router maintenance [[phab:T248394|T248394]]
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:32 mutante: decom mw1232 - mw1235
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:27 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[0-3].eqiad.wmnet
* 11:26 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[2-5].eqiad.wmnet
* 11:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:21 Urbanecm: EU SWAT done
* 11:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[2-5].eqiad.wmnet
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|59412db}}: Add gwtoolset to available rights to allow granting to global groups (duration: 01m 07s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|7b8d7c5}}: TwoColConflict: Limited default deployment CommonSettings.php ([[phab:T244863|T244863]]) (duration: 01m 06s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]; take II) (duration: 01m 06s)
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]) (duration: 01m 17s)
* 11:08 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1091 load, increase main traffic on all other s4 instances', diff saved to https://phabricator.wikimedia.org/P10762 and previous config saved to /var/cache/conftool/dbconfig/20200325-110821-jynus.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1137', diff saved to https://phabricator.wikimedia.org/P10761 and previous config saved to /var/cache/conftool/dbconfig/20200325-105503-marostegui.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json
* 10:37 XioNoX: change aggregate policy for 2620:0:862::/48 on cr3-knams - [[phab:T236785|T236785]]
* 10:19 XioNoX: change aggregate policy for v4 prefixes on cr2-eqdfw - [[phab:T236785|T236785]]
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 09:56 XioNoX: change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - [[phab:T236785|T236785]]
* 09:54 vgutierrez: Enable inbound TLSv1.3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:23 vgutierrez: upgrade ATS to 8.0.6-1wm3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 marostegui: Reimage db1137
* 08:18 marostegui: Reboot db1117 for full-upgrade
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 08:14 _joe_: upgrading all eventgate-main to envoy 1.13.1 [[phab:T246868|T246868]]
* 08:12 marostegui: Stop all mysql daemons on db1117
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 07:42 XioNoX: reboot scs-eqsin for CPU usage
* 07:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json
* 06:57 marostegui: Deploy schema change on db2129 (s6 codfw master)
* 06:15 marostegui: Rename tables on db1133 (m5 master) nova_api database - [[phab:T248313|T248313]]
* 06:13 marostegui: Remove grants 'nova'@'208.80.154.23' on nova.* - [[phab:T248313|T248313]]
 
== 2020-03-24 ==
* 20:53 cdanis: repool eqsin
* 20:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s)
* 20:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 20:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s)
* 20:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs [[phab:T233873|T233873]]
* 20:32 twentyafterfour@deploy1001: Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 20:31 cdanis: rebooting cr2-eqsin [[phab:T248394|T248394]]
* 20:30 twentyafterfour@deploy1001: Synchronized wmf-config: Now sync InitializeSettings* refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 20:28 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs [[phab:T248409|T248409]] (duration: 00m 58s)
* 20:27 volans: force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console)
* 20:26 cdanis: commit flow-table-size on cr2-eqsin [[phab:T248394|T248394]]
* 20:19 cdanis: eqsin depooled for router maintenance at 16:15
* 19:29 twentyafterfour@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 19:29 twentyafterfour: rolling back to wmf.24 due to high error rate refs [[phab:T233873|T233873]]
* 19:28 twentyafterfour@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 18:49 gehel: repooling wdqs1006, catched up on lag
* 17:12 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]] (duration: 77m 52s)
* 17:10 ebernhardson: update cloudelastic-chi replica counts from 2 to 1 [[phab:T231517|T231517]]
* 16:41 moritzm: installing linux-perf updates on stretch
* 16:31 moritzm: installing linux-perf-4.19 updates on buster
* 15:58 mutante: installing OS on otrs1001.eqiad.wmnet ([[phab:T248028|T248028]])
* 15:55 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]]
* 15:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:31 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.22 (duration: 02m 02s)
* 15:29 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.21 (duration: 24m 00s)
* 15:17 hashar: Cleaning old MediaWiki deployments # [[phab:T233873|T233873]]
* 15:03 hashar: Applied patches to 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 14:59 hashar: scap prep 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 14:55 gehel: depooling wdqs1006 to catch up on lag
* 14:28 marostegui: Deploy schema change on db2117 (s6)
* 14:26 hashar: Branching wmf/1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 13:22 moritzm: installing glib2.0 updates from Stretch point release
* 13:04 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 12:16 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Toroid~huwiki' 'Toroidt' ([[phab:T248371|T248371]])
* 12:10 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Erika Greenberg' 'Copperqueen' ([[phab:T248371|T248371]])
* 11:57 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Romy merdeka' 'Romy_Dwi_Laksono' ([[phab:T248371|T248371]])
* 11:55 marostegui: Deploy schema change on db2087 db2089 db2097
* 11:34 Urbanecm: EU SWAT done
* 11:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]; take II) (duration: 00m 59s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 00m 59s)
* 11:25 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 01m 03s)
* 10:08 gehel: restart blazegraph and updater on wdqs1004
* 09:41 marostegui: Deploy schema change on db2076 (s6)
* 08:39 marostegui: Rename nova database tables on db1133 (m5 master) - [[phab:T248313|T248313]]
* 08:25 marostegui: Rename wikidatawiki.wb_terms on db1104 - [[phab:T248086|T248086]]
* 07:33 elukey: restart update-openstack-mirror.service on sodium
* 06:55 marostegui: Reboot dbproxy1018
* 06:42 marostegui: Reboot dbproxy1019
* 06:16 marostegui: Create empty database testreduce on m5 master [[phab:T245408|T245408]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1087, vslow s8, with weight 1 as it originally had', diff saved to https://phabricator.wikimedia.org/P10753 and previous config saved to /var/cache/conftool/dbconfig/20200324-060133-marostegui.json
 
== 2020-03-23 ==
* 21:50 krinkle@deploy1001: Synchronized docroot/noc/css/vector.css: {{Gerrit|I627a0ddba5}} (duration: 01m 02s)
* 21:39 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@26aa5c3]: Update recommendation-api to {{Gerrit|3141cb6}} (duration: 03m 21s)
* 18:45 Urbanecm: Morning SWAT done
* 18:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]; take II) (duration: 00m 59s)
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]) (duration: 01m 00s)
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]; take II) (duration: 00m 59s)
* 18:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 18:32 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: {{Gerrit|cbda0e5}}: ApiVisualEditorEdit: Fix handling of minor parameter ([[phab:T248257|T248257]]) (duration: 01m 00s)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|212114e}}: Dont try to grant `oathauth-enable` to `*` ([[phab:T248282|T248282]]) (duration: 00m 59s)
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]], take II) (duration: 00m 59s)
* 18:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]; take II) (duration: 00m 59s)
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]) (duration: 01m 00s)
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:08 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:31 ema: upload atskafka 0.5 to buster-wikimedia [[phab:T237993|T237993]]
* 15:59 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (take 2) (duration: 00m 59s)
* 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (duration: 01m 00s)
* 15:32 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 15:24 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:13 moritzm: installing freetype updates from Stretch point release
* 15:04 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (take 2) (duration: 00m 59s)
* 15:00 jynus@cumin1001: dbctl commit (dc=all): 'Remove db1089 for special groups (rc)', diff saved to https://phabricator.wikimedia.org/P10749 and previous config saved to /var/cache/conftool/dbconfig/20200323-150046-jynus.json
* 15:00 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (duration: 01m 01s)
* 14:46 jynus@cumin1001: dbctl commit (dc=all): 'Finish doubling db1107 main s1 traffic', diff saved to https://phabricator.wikimedia.org/P10748 and previous config saved to /var/cache/conftool/dbconfig/20200323-144612-jynus.json
* 14:40 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1107 main s1 traffic a 50%', diff saved to https://phabricator.wikimedia.org/P10747 and previous config saved to /var/cache/conftool/dbconfig/20200323-144005-jynus.json
* 14:35 jynus@cumin1001: dbctl commit (dc=all): 'remove db1107 from special groups', diff saved to https://phabricator.wikimedia.org/P10746 and previous config saved to /var/cache/conftool/dbconfig/20200323-143536-jynus.json
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporarily disable client side error logging for a deploy - [[phab:T226986|T226986]] (duration: 01m 01s)
* 13:33 moritzm: installing python-cryptography updates from Stretch point release
* 12:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 tgr@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/OAuth/includes/frontend/specialpages/SpecialMWOAuthManageMyGrants.php: SWAT: [[gerrit:582768{{!}}Get consumerKey from consumerId not from acceptanceId (T247531)]] (duration: 01m 01s)
* 11:32 ema: cp1081: restart prometheus-trafficserver-tls-exporter.service
* 11:27 elukey: upload oozie 4.3.0-3 to thirparty/bigtop14 on wikimedia-stretch - [[phab:T244499|T244499]]
* 10:37 jbond42: switch idp1001 to tlsproxy::envoy profile
* 08:07 marostegui: Start m1 and m2 on db1117
* 08:04 marostegui: Stop m1 and m2 on db1117 to transfer them to db1077 - this will trigger dbproxies IRC alert
* 08:03 moritzm: installing python-cryptography bug fix updates from Stretch point release
* 07:46 marostegui: Stop MySQL on db1077 (non used) for 10.4 upgrade and gtid_domain_id on multisource [[phab:T149418|T149418]]
 
== 2020-03-22 ==
* 23:19 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T248274|T248274]] (duration: 01m 19s)
* 04:37 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
 
== 2020-03-20 ==
* 23:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:04 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:41 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[4-9].eqiad.wmnet
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[0-1].eqiad.wmnet
* 20:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[7-9].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[4-9].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-1].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[7-9].eqiad.wmnet
* 15:44 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/ActorMigration.php: Avoid upsert() log warning spam in ActorMigration due to unique key array format - [[phab:T248147|T248147]] (duration: 01m 01s)
* 13:34 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:33 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10741 and previous config saved to /var/cache/conftool/dbconfig/20200320-121628-marostegui.json
* 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 11:10 elukey: upload oozie 4.3.0-2 packages to thirdparty/bigtop14 on wikimedia-stretch
* 10:56 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:56 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:29 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:13 dcausse: repooling wdqs1006
* 09:28 moritzm: rolling restart of FPM on mw1261-mw1265 for freetype update
* 08:59 moritzm: installing freetype bugfix updates from stretch point release
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1017', diff saved to https://phabricator.wikimedia.org/P10739 and previous config saved to /var/cache/conftool/dbconfig/20200320-084730-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10738 and previous config saved to /var/cache/conftool/dbconfig/20200320-083334-marostegui.json
* 07:59 XioNoX: reorder LVS BGP neighbors and add descriptions - https://gerrit.wikimedia.org/r/576320
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10737 and previous config saved to /var/cache/conftool/dbconfig/20200320-074816-marostegui.json
* 07:46 elukey: upload hadoop_2.8.5-2 (and related debs) to thirdparty/bigtop14 on wikimedia-stretch (manually rebuilt via docker after patch backports from upstream)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10736 and previous config saved to /var/cache/conftool/dbconfig/20200320-073205-marostegui.json
* 07:26 marostegui: Restart mysql on es1017 for upgrade - [[phab:T239791|T239791]]
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1017 for update [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10735 and previous config saved to /var/cache/conftool/dbconfig/20200320-070945-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1014 to es3 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10734 and previous config saved to /var/cache/conftool/dbconfig/20200320-070922-marostegui.json
 
== 2020-03-19 ==
* 22:15 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}} (duration: 05m 13s)
* 22:10 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}}
* 19:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.24
* 18:30 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/ByIdDispatchingEntityInfoBuilder.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part II (duration: 01m 07s)
* 18:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/data-access/src/SingleEntitySourceServices.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part I (duration: 01m 08s)
* 17:47 mutante: releases/releases-jenkins - closed firewall hole to port 80 for caching servers - kept it open just for envoy from the backends - ATS speaks https to them meanwhile
* 16:54 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/RelatedArticles: Do not register "" as a style path, that breaks ResourceLoader - [[phab:T248090|T248090]] (duration: 01m 07s)
* 16:01 jeh@deploy1001: Finished deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule (duration: 03m 31s)
* 15:57 jeh@deploy1001: Started deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule
* 15:19 andrew@deploy1001: deploy aborted: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this) (duration: 00m 00s)
* 14:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 14:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 13:32 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.24 (duration: 01m 07s)
* 13:31 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.24
* 13:11 marostegui: Rename testwikidatawiki.wb_terms on db1078 - [[phab:T248086|T248086]]
* 12:33 XioNoX: push frack fw policies [[phab:T248004|T248004]]
* 11:43 Lucas_WMDE: EU SWAT done
* 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.24/includes/OutputPage.php: SWAT: [[gerrit:581245{{!}}OutputPage: Fix warning when setting wgUserNewMsgRevisionId (T248049)]] (duration: 01m 08s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]; take II) (duration: 01m 08s)
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]) (duration: 01m 07s)
* 10:47 ema: upload atskafka 0.4 to buster-wikimedia [[phab:T237993|T237993]]
* 10:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/skin.json: [[gerrit:581248{{!}}skins.vector.styles.legacy needs to define legacy feature (T247566)]] (duration: 01m 08s)
* 10:01 ema: cp: rolling ats-tls-restart to apply log format changes [[phab:T248067|T248067]] [[phab:T237993|T237993]]
* 09:26 marostegui: m2 maintenance window done [[phab:T246098|T246098]]
* 09:03 akosiaris: restart gerrit on gerrit1001 [[phab:T246098|T246098]]
* 09:02 akosiaris: restart otrs-daemon, apache on mendelevium [[phab:T246098|T246098]]
* 09:01 akosiaris: restart recommendation-api on scb [[phab:T246098|T246098]]
* 09:00 marostegui: Restart m2 primary database master - [[phab:T246098|T246098]]
* 08:48 dcausse: depooling wdqs1006 to help catching up lag
* 08:43 dcausse: restarting blazegraph on wdqs1006 ([[phab:T242453|T242453]])
* 07:54 moritzm: installing cups updates from Stretch point release
* 07:48 moritzm: installing libjaxen-java security updates from Stretch point release
* 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Update pc1008 spare situation [[phab:T247787|T247787]] (duration: 01m 09s)
* 06:49 elukey: execute 'sudo rm /etc/logrotate.d/ceph-common' on cloudvirt-dev and cloudcontrol-dev to stop daily cronspam
* 06:46 marostegui: Deploy schema change on testcommonswiki.globalimagelinks (empty table) on the s4 master [[phab:T243987|T243987]]
* 06:33 marostegui: Upgrade db1132 without restarting [[phab:T246098|T246098]]
* 00:39 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikiws to 1.35.0-wmf.24 refs [[phab:T233872|T233872]]
* 00:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: deploy https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581116 which reverts https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581054 refs  [[phab:T248010|T248010]] (duration: 01m 07s)
* 00:18 eileen: civicrm revision changed from {{Gerrit|a1b2cbeac1}} to {{Gerrit|1c477ff07f}}, config revision is {{Gerrit|37232d8460}}
 
== 2020-03-18 ==
* 23:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.23/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581114/ refs [[phab:T248010|T248010]] (duration: 01m 07s)
* 23:26 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581115/ (duration: 01m 08s)
* 22:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:18 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:56 Krinkle: krinkle@mw1385: scap pull # clean up AdHoc debugging for [[phab:T248010|T248010]]
* 21:16 brennen@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 06s)
* 21:11 brennen@deploy1001: Synchronized php-1.35.0-wmf.23/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 15s)
* 20:04 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:49 hashar@deploy1001: rebuilt and synchronized wikiversions files: Ensure fleet wide consistency
* 19:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:21 mutante: shutting down (decom cookbook) elnath.codfw.wmnet ([[phab:T188544|T188544]])
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:15 fdans@deploy1001: Finished deploy [analytics/refinery@549f6a4]: deploying analytics refinery (duration: 15m 02s)
* 19:11 hashar: 1.35.0-wmf.24 is on hold: too many blockers
* 19:00 fdans@deploy1001: Started deploy [analytics/refinery@549f6a4]: deploying analytics refinery
* 18:32 Lucas_WMDE: Morning SWAT done
* 18:30 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:579018{{!}}Update linter whitelist w/ parsoid11's IP address (T246833)]] (beta-only) (duration: 01m 04s)
* 18:20 Lucas_WMDE: scap pull on mwdebug1001, attempting to fix mismatched wikiversions alert
* 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 08s)
* 18:13 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]], take II (duration: 01m 07s)
* 18:11 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 07s)
* 16:43 mutante: wtp1025 - Icinga alerted it's running out of disk - 'apt-get clean' lowered disk usage from 97% to 91%
* 16:00 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]] (duration: 61m 23s)
* 14:58 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]]
* 14:41 vgutierrez: disable TLS session tickets in ulsfo - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 14:29 godog: add debug to icinga2001 - [[phab:T247538|T247538]]
* 14:28 _joe_: restarted php-fpm on mw1283, was throwing SIGILL
* 14:17 marostegui: Rename wb_terms on codfw hosts: s8 (wikidatawiki - db2081), s3 (testwikidatawiki - db2109), s4 (commonswiki, testcommonswiki - db2106)  [[phab:T208425|T208425]]
* 14:06 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.23
* 11:59 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 07s)
* 11:57 hashar@deploy1001: Synchronized php-1.35.0-wmf.23/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 10s)
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10715 and previous config saved to /var/cache/conftool/dbconfig/20200318-114259-marostegui.json
* 11:17 ema: upload atskafka 0.3 to buster-wikimedia [[phab:T237993|T237993]]
* 11:16 kart_: EU Mid-day SWAT done
* 11:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]], take II (duration: 01m 07s)
* 11:10 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]] (duration: 01m 07s)
* 10:58 _joe_: setting num_retries=0 on mw2224 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 10:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)
* 10:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]] (duration: 01m 08s)
* 10:52 _joe_: setting num_retries=0, idle_timeout=5s on mw2223 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 10:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]], take II (duration: 01m 07s)
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]] (duration: 01m 07s)
* 10:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 10:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 07s)
* 10:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 10:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 08s)
* 09:43 vgutierrez: enabling inbound TLSv1.3 in upload@ulsfo - [[phab:T170567|T170567]]
* 09:18 vgutierrez: enabling inbound TLSv1.3 in cp4026 - [[phab:T170567|T170567]]
* 08:44 marostegui: Start replication pc1008 from pc1010 to get some of the new keys so it is not fully empty - [[phab:T247787|T247787]]
* 08:14 vgutierrez: upgrade ATS to 8.0.6-1wm3 in ulsfo - [[phab:T170567|T170567]]
* 07:55 moritzm: installing remaining libxslt security updates
* 07:40 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-analytics to use envoy everywhere (duration: 01m 10s)
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:31 marostegui: Reboot pc1008 to try to get its RAID redone - [[phab:T247787|T247787]]
* 00:31 Amir1: foreachwikiindblist medium deleteEqualMessages.php --delete ([[phab:T247562|T247562]])
* 00:10 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 02m 29s)
* 00:08 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
* 00:07 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 01m 17s)
* 00:06 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
 
== 2020-03-17 ==
* 22:49 Amir1: warming up cache for Q80M to Q88M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 22:17 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}} (duration: 06m 08s)
* 22:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}}
* 21:54 Krinkle: krinkle@mw2170$ disable-puppet (Testing for [[phab:T99740|T99740]])
* 21:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting (again) ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 21:10 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 20:50 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters, take 2 ([[phab:T244974|T244974]]) (duration: 01m 12s)
* 20:33 mutante: boron - systemctl start docker-reporter-k8s-images ; systemctl start docker-reporter-releng-images
* 20:31 mutante: boron - had degraded systemd state in Icinga - systemctl start docker-reporter-base-images
* 19:54 mutante: miscweb1001 - restarted ferm, reverted live hack
* 19:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]] (duration: 14m 31s)
* 19:51 mutante: miscweb1001 - testing if ferm 80 firewall hole is needed for envoy, temp. disabled puppet, restarted ferm
* 19:38 ppchelko@deploy1001: Started deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]]
* 19:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]], take II (duration: 01m 06s)
* 19:00 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]] (duration: 01m 07s)
* 18:53 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580390{{!}}Do not lock rows when there's no term returned (T247553 T246898)]], To catch the train (duration: 01m 08s)
* 18:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:39 mutante: removing mw1238 through mw1243 - decom with cookbook ([[phab:T247780|T247780]] [[phab:T245099|T245099]])
* 18:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[8-9].eqiad.wmnet
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[0-3].eqiad.wmnet
* 18:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:01 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}} (duration: 06m 06s)
* 18:00 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:58 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:56 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/languages/LanguageConverter.php: [[gerrit:580361{{!}}languages: Don't assume  in LanguageConverter (T235360)]] (duration: 01m 07s)
* 17:55 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}}
* 17:55 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-3].eqiad.wmnet
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[89].eqiad.wmnet
* 17:52 Amir1: warming up cache for Q70M to Q80M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 17:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580352{{!}}Do not lock rows when there's no term returned (T247553 T246898)]] (duration: 01m 07s)
* 17:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:37 ejegg: updated payments-wiki from {{Gerrit|86ce0361f9}} to {{Gerrit|72856949a1}}
* 17:30 bearND: mobileapps deploy failed on canary, rolled back
* 17:29 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}} (duration: 04m 00s)
* 17:25 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}}
* 17:24 elukey@deploy1001: Finished deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2 (duration: 00m 43s)
* 17:24 elukey@deploy1001: Started deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2
* 17:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 17:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1280.eqiad.wmnet
* 17:10 jynus: purging some old rows on pc1010 on a screen to earn some time [[phab:T247788|T247788]]
* 16:56 mutante: mw1280 - scap pull - had ancient mw version due to downtime
* 16:46 mutante: mw1280 back after long downtime due to broken RAM, added back into puppet ([[phab:T240187|T240187]])
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Reverting All wikis to 1.35.0-wmf.23
* 15:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:52 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 05m 16s)
* 15:51 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:44 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 03m 49s)
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:01 hashar: scap prep 1.35.0-wmf.24 and applying security patches # [[phab:T233872|T233872]]
* 15:00 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:57 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:44 dcausse: wdqs1010 (test server) is running a data-reload cookbook (and is probably taking longer than the expected downtime)
* 14:38 hashar: mediawiki/core git push {{Gerrit|68bc9300dc}}:wmf/1.35.0-wmf.24  to catch up with a change that got merged while branch is being cut # [[phab:T233872|T233872]]
* 14:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]], take II (duration: 01m 04s)
* 14:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]] (duration: 01m 10s)
* 14:24 marostegui: Stop mysql and restart pc1008 [[phab:T247787|T247787]]
* 14:23 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:21 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580328{{!}}Store item terms at late as possible to avoid deadlocks (T247553 T246898)]] (duration: 01m 07s)
* 14:13 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:07 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 14:06 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:41 hashar: Branching 1.35.0-wmf.24 # [[phab:T233872|T233872]]
* 13:30 godog: stop puppet and turn on debug on icinga2001 - [[phab:T247538|T247538]]
* 12:06 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 12:06 cdanis@cumin1001: START - Cookbook sre.network.cf
* 11:46 godog: test pinning icinga to a subset of cpu on icinga1001
* 11:16 akosiaris: [[phab:T242461|T242461]] undeploy restrouter. Unused service and per task to not  be used after all
* 11:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 11:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 11:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 10:56 XioNoX: add extra prepend to LG export filter
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:41 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:40 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:40 jbond42: sec update for libgraphicsmagick on maps
* 10:20 godog: bounce squid on install1003 [[phab:T247759|T247759]]
* 10:07 _joe_: sudo cumin -b2 -s 50 'A:mw-jobrunner' 'restart-php7.2-fpm' [[phab:T247622|T247622]]
* 10:03 Amir1: warming up cache for Q60M to Q70M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 10:02 ema: create kafka topic atskafka_test_webrequest_text [[phab:T247497|T247497]]
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 09:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]], take II (duration: 01m 05s)
* 09:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]] (duration: 01m 09s)
* 09:27 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 09:21 ema: cp: rolling varnish-frontend-restart to decrease memory usage and apply transient storage limits [[phab:T185968|T185968]]
* 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 08:39 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 00:57 krinkle@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Formatters/: {{Gerrit|Ic77b2c6b33a}}, [[phab:T247458|T247458]] (duration: 01m 12s)
 
== 2020-03-16 ==
* 23:14 tzatziki: reset email for "MNadrofsky (WMF)" on SUL and officewiki
* 20:58 mutante: mw1223 power down
* 20:54 mutante: powercycling mw1223
* 20:52 mutante: 5 old API appservers in eqiad removed
* 20:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[1-6].eqiad.wmnet
* 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:04 mutante: depool (yes->no) mw1221 - mw1226 ([[phab:T247780|T247780]])
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
* 19:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}} (duration: 06m 48s)
* 19:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:24 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:23 jynus: stop replication at pc1010 at pos pc1007-bin.080617:{{Gerrit|259138670}}
* 19:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}}
* 19:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 instead of pc1008 as pc1008 is overloaded (duration: 01m 06s)
* 18:38 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|I2c3217fb3da8bb65}} (duration: 01m 07s)
* 18:36 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op, courtesy of opcache (duration: 01m 06s)
* 18:34 krinkle@deploy1001: Synchronized docroot/noc/: {{Gerrit|I2c3217fb3}} (duration: 01m 07s)
* 18:18 mforns@deploy1001: Finished deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118 (duration: 13m 01s)
* 18:05 mforns@deploy1001: Started deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118
* 17:08 Amir1: warming up cache for Q50M to Q60M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 17:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]], take II (duration: 01m 08s)
* 17:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]] (duration: 01m 06s)
* 16:54 gehel: repooling wdqs1005
* 16:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enforce Content Security Policy if wmgUseCSP is set [[phab:T244124|T244124]] (duration: 01m 06s)
* 16:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseCSP false everywhere [[phab:T244124|T244124]] (duration: 01m 07s)
* 16:34 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I498e2ebd8c9}} (duration: 01m 07s)
* 16:33 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|I498e2ebd8c9}} (no-op) (duration: 01m 07s)
* 16:30 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|I870122f946d}} (duration: 01m 07s)
* 16:22 rlazarus: copied envoyproxy_1.13.1-1 from buster-wikimedia to stretch-wikimedia
* 16:21 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I08af45e2e47}} (duration: 01m 07s)
* 16:14 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|Ie9002d9095ee}} (duration: 01m 08s)
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-recursive_0.0.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-anaphora_0.0.4-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 15:02 moritzm: rolling restart of FPM/apache on netmon* to pick up libxslt security updates
* 14:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]], take II (duration: 01m 06s)
* 14:22 Amir1: warming up cache for Q40M to Q50M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 14:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]] (duration: 01m 07s)
* 14:16 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up libxslt security updates
* 14:15 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id {{Gerrit|87500000}} --to-id {{Gerrit|87767570}} --batch-size=10 --sleep=5 ([[phab:T219123|T219123]])
* 14:05 moritzm: installing libxslt security updates
* 13:49 ema: upload atskafka 0.1 to buster-wikimedia [[phab:T237993|T237993]]
* 13:42 gehel: restarting blazegraph on wdqs1007
* 13:30 gehel: depooling wdqs1005 to catch up on lag
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1015', diff saved to https://phabricator.wikimedia.org/P10706 and previous config saved to /var/cache/conftool/dbconfig/20200316-124309-marostegui.json
* 12:09 Amir1: warming up cache for Q35M to Q40M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]], take II (duration: 01m 07s)
* 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]] (duration: 01m 08s)
* 11:52 XioNoX: manually fix prometheus squid exporter on install1003
* 11:04 Amir1: ... for Q30M-Q35M of the new term store
* 11:04 Amir1: Warming up InnoDB buffer pool cache in db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 10:55 Amir1: warming up db1026 for up to Q35M for the new term store ([[phab:T219123|T219123]])
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10705 and previous config saved to /var/cache/conftool/dbconfig/20200316-104723-marostegui.json
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]), take II (duration: 01m 07s)
* 10:43 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]) (duration: 01m 13s)
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10704 and previous config saved to /var/cache/conftool/dbconfig/20200316-104002-marostegui.json
* 10:36 elukey: roll restart of recommendation service on scb* as attempt to fix the flapping alerts - [[phab:T247732|T247732]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10703 and previous config saved to /var/cache/conftool/dbconfig/20200316-102829-marostegui.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10702 and previous config saved to /var/cache/conftool/dbconfig/20200316-101707-marostegui.json
* 10:10 marostegui: Stop mysql for upgrade on es1015 [[phab:T239791|T239791]]
* 10:02 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=0 --file=15march2217-holes-nulls.list on screen ([[phab:T219123|T219123]])
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for upgrade and restart [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10701 and previous config saved to /var/cache/conftool/dbconfig/20200316-093228-marostegui.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1011 to es2 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10700 and previous config saved to /var/cache/conftool/dbconfig/20200316-093048-marostegui.json
* 08:16 marostegui: Review and enable events on recently migrated 10.4 hosts - [[phab:T247728|T247728]]
* 08:02 ema: cp4025 restart trafficserver-tls to clear 'tls process restarted' alert [[phab:T241593|T241593]] [[phab:T185968|T185968]]
* 07:57 moritzm: installing libxslt security updates
* 07:52 ema: cp4025: restart varnish-fe to clear 'child restarted' alert [[phab:T185968|T185968]]
* 07:47 moritzm: installing lxml security updates
* 07:14 moritzm: installing libgd2 security updates on jessie
* 06:54 moritzm: removing some library packages from jessie/stretch after labstore1006/1007 dist-upgrade to buster
* 06:38 _joe_: restart envoy with 10 requests per connection on mw2231, [[phab:T247484|T247484]]
 
== 2020-03-15 ==
* 23:20 jynus: removed oldest snapshots on dbprov1001
* 13:27 dcausse: restarting blazegraph on wdqs1005 [[phab:T242453|T242453]]
* 07:01 marostegui: Restart logrotate on db1107
 
== 2020-03-14 ==
* 08:33 elukey: run kafka preferred-replica-election on kafka-jumbo1001 - [[phab:T247561|T247561]]
* 08:32 elukey: run systemctl restart systemd-timedated.service on stat1008
* 01:06 mutante: planet1001 - copying /etc/apt/sources.list from planet2001 to planet1001 - apt-get update - apt-get install openssh-server [[phab:T247592|T247592]]
 
== 2020-03-13 ==
* 23:12 bstorm_: rebooting labstore1006 for upgrade to stretch [[phab:T224583|T224583]]
* 22:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:45 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 22:27 bstorm_: rebooting labstore1006 [[phab:T224583|T224583]]
* 22:21 bstorm_: downtimed labstore1006 for upgrades [[phab:T224583|T224583]]
* 20:02 mutante: stat1005 - ip link set en01 down ; ip link set en01 up ([[phab:T247561|T247561]])
* 19:30 bstorm_: rebooting labstore1007 for upgrade to buster [[phab:T224583|T224583]]
* 18:51 shdubsh: test increase fs.inotify.max_user_watches on prometheus2004
* 17:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:21 mutante: removed squid from install1002/install2002 (formerly webproxy.(eqiad{{!}}codfw).wmnet until 2 days ago, replaced by install1003/install2003) [[phab:T224576|T224576]]
* 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 17:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:08 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 17:00 krinkle@deploy1001: Synchronized dblists/: {{Gerrit|If4d17082f}}, {{Gerrit|Iadba5b01b}}, {{Gerrit|Ibe16d5f09}} (duration: 01m 07s)
* 16:58 krinkle@deploy1001: Synchronized wmf-config/config/: {{Gerrit|Ibe16d5f09}} (duration: 01m 10s)
* 16:51 bstorm_: rebooting labstore1007 for stretch upgrade [[phab:T224583|T224583]]
* 16:37 krinkle@deploy1001: Synchronized wmf-config/config/: {{Gerrit|If4d17082f}}, {{Gerrit|Iadba5b01b}} (duration: 01m 11s)
* 16:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 bstorm_: rebooting labstore1007 for first cycle of upgrades [[phab:T224583|T224583]]
* 16:02 elukey: powercycle kafka-jumbo1006 after switch port changed - [[phab:T247561|T247561]]
* 15:28 _joe_: switch envoy logging to debug on mw2231
* 14:57 cdanis: [[phab:T247586|T247586]] ✔️ cdanis@grafana1002.eqiad.wmnet ~ 🕥☕ sudo systemctl restart apache2.service
* 12:48 Urbanecm: Password reset for SUL User:FuduBot ([[phab:T247601|T247601]])
* 12:16 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
* 10:26 moritzm: installing python-werkzeug security updates
* 10:09 vgutierrez: upload trafficserver 8.0.6-1wm3 to apt.wm.o (buster) - [[phab:T245616|T245616]]
* 09:55 _joe_: running puppet across appservers to switch to http for eventgate-analytics [[phab:T247484|T247484]]
* 09:17 moritzm: installing perl updates from Stretch point release
* 06:16 vgutierrez: triggering OCSP response updates in eqiad,codfw and ulsfo - [[phab:T247584|T247584]]
* 06:12 vgutierrez: triggering OCSP response updates in eqsin - [[phab:T247584|T247584]]
* 06:05 vgutierrez: triggering OCSP response updates in esams - [[phab:T247584|T247584]]
* 00:20 shdubsh: reload prometheus@ops on prometheus1003
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw215[8-9].codfw.wmnet
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw216[0-9].codfw.wmnet
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw217[1-2].codfw.wmnet
* 00:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-03-12 ==
* 23:58 shdubsh: reload prometheus@ops on prometheus1004
* 23:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[1-2].codfw.wmnet
* 23:41 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet
* 23:40 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw215[89].codfw.wmnet
* 23:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw215[89].codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2178.codfw.wmnet
* 23:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[0-6].codfw.wmnet
* 22:45 krinkle@deploy1001: Synchronized multiversion/: {{Gerrit|I403a9890a9}} (duration: 01m 07s)
* 22:44 krinkle@deploy1001: Synchronized dblists/: {{Gerrit|I403a9890a9}} (duration: 01m 09s)
* 22:41 mforns@deploy1001: Finished deploy [analytics/refinery@906bd1e]: deploying refinery together with refinery-source v0.0.118 (duration: 12m 20s)
* 22:28 mforns@deploy1001: Started deploy [analytics/refinery@906bd1e]: deploying refinery together with refinery-source v0.0.118
* 22:15 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:15 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 22:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:07 bstorm_: moving all nfs traffic off labstore1007 and to labstore1006 for upgrades [[phab:T224583|T224583]]
* 22:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:05 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 22:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:47 mutante: doc1001 - had to manually run "/usr/local/sbin/build-envoy-config -c /etc/envoy/" to get envoy tls_terminator_443 listener into the config or envoy would not listen on 443 ([[phab:T210411|T210411]])
* 21:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 21:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:06 foks: remove one file for legal compliance
* 20:49 ottomata: kafka-jumbo1006 - stopping kafka and powercycling - [[phab:T247561|T247561]]
* 20:15 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.35.0-wmf.23"
* 20:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.23
* 20:10 mutante: revoking puppet cert for doc.discovery.wmnet, re-creating with doc.wikimedia.org as SAN
* 20:09 eileen: civicrm revision changed from {{Gerrit|a301076871}} to {{Gerrit|a1b2cbeac1}}, config revision is {{Gerrit|37232d8460}}
* 19:46 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set term store to WRITE_BOTH for all of Wikidata", take II (duration: 01m 06s)
* 19:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set term store to WRITE_BOTH for all of Wikidata" (duration: 01m 08s)
* 19:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:34 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: cirrus: Start Glent m0 AB test (duration: 01m 07s)
* 18:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: re-sync InitialiseSettings.php (duration: 01m 08s)
* 18:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:579326{{!}}Set term store to WRITE_BOTH for all of Wikidata (T219123)]] (duration: 01m 07s)
* 18:23 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:579348{{!}}Switch kowiki to use ORES for suggested edits topics]] (duration: 01m 08s)
* 18:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:48 elukey: increase via 'kadmin.local modprinc -maxlife 2d $user' all max ticket lifetimes of Kerberos User principals on the krb1001's KDC (changes will be propagated to codfw automatically)
* 17:48 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:17 elukey: execute modprinc -maxlife 2d krbtgt/WIKIMEDIA via kadmin.local on krb1001 (will be propagated to 2001 automatically)
* 17:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:06 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:28 volans: restarting icinga, acting up on command file (frack awol and downtimes)
* 16:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 rlazarus: uploading envoyproxy_1.13.1-1 (upgrade from 1.12.2) T246868
* 14:51 elukey: restart kpropd daemon on krb2001
* 14:26 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:23 volans@cumin2001: START - Cookbook sre.dns.netbox
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:35 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
* 13:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 13:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 13:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:33 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:29 volans@cumin2001: START - Cookbook sre.dns.netbox
* 12:00 tarrow: EU SWAT done
* 12:00 tarrow@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/TwoColConflict: SWAT: [[gerrit:579221{{!}}Detect whether an edit came from VisualEditor (T245722)]] (duration: 01m 10s)
* 11:42 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:42 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:37 volans@cumin2001: START - Cookbook sre.dns.netbox
* 11:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:09 elukey: roll restart of krb-kdc on krb1001/krb2001 to pick up new ticket lifetime settings (10h -> 48h)
* 11:09 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:09 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:02 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:59 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:39 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:29 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:28 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:28 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:13 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:13 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 09:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 09:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:55 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch ores to use envoy (duration: 01m 08s)
* 08:36 addshore: start "rebuild" of Q87 -> 87.5 million for [[phab:T219123|T219123]]
* 08:27 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 ([[phab:T219123|T219123]]) cache bust (duration: 01m 08s)
* 08:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 ([[phab:T219123|T219123]]) (duration: 01m 12s)
* 08:12 elukey: push new install/webproxy terms for analytics-in4/6 to cr1/cr2-eqiad
* 07:28 kart_: Updated cxserver charts to 0.0.13
* 07:26 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 07:24 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 07:22 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 06:14 kart_: Updated cxserver to 2020-03-12-041806-production and added sectionmapping db config ([[phab:T246316|T246316]], [[phab:T243430|T243430]], [[phab:T202276|T202276]])
* 06:11 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 06:08 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 06:03 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 01:51 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/WikimediaEditorTasks: Revert 'Fix revert counting for non-language-specific counters' ([[phab:T247479|T247479]]) (duration: 01m 08s)
* 01:13 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@4e2ea09]: resolve deadlock in bulk_daemon (duration: 10m 05s)
* 01:03 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@4e2ea09]: resolve deadlock in bulk_daemon
* 00:56 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: wait around for counts to match up in reindexer before giving up (duration: 01m 08s)
* 00:53 ebernhardson: wmf.23 cirrussearch: wait around for counts to match before giving up
* 00:52 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: (no justification provided) (duration: 01m 12s)
* 00:23 mutante: switching webproxy.eqiad.wmnet / webproxy.codfw.wmnet to install[12]003 (squids on buster)
* 00:16 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable depicts counter due to code revert ([[phab:T244974|T244974]]), take 2 (duration: 01m 07s)
* 00:14 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable depicts counter due to code revert ([[phab:T244974|T244974]]) (duration: 01m 07s)
* 00:00 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Revert 'Fix revert counting for non-language-specific counters' ([[phab:T247479|T247479]]) (duration: 01m 07s)
 
== 2020-03-11 ==
* 23:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable depicts counter ([[phab:T244974|T244974]]) (Simon says) (duration: 01m 07s)
* 23:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable depicts counter ([[phab:T244974|T244974]]) (duration: 01m 07s)
* 23:51 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:51 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters (duration: 01m 08s)
* 23:40 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters (duration: 01m 11s)
* 23:18 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|I91b3a18317af}} (duration: 01m 08s)
* 22:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 22:39 volans@cumin2001: START - Cookbook sre.dns.netbox
* 22:28 mutante: depooled mw2167 through mw2172 - rack C3 ([[phab:T247018|T247018]])
* 22:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[012].codfw.wmnet
* 22:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[789].codfw.wmnet
* 22:16 James_F: Purged trwiki logos from ATS/Varnish for [[phab:T247445|T247445]]
* 22:15 jforrester@deploy1001: Synchronized static/images/project-logos/: [trwiki] Restore pre-unblocking celebration logo versions [[phab:T247445|T247445]] (duration: 01m 09s)
* 21:42 ebernhardson: stop all mjolnir-kafka-bulk-daemons in eqiad except 1 to assist debugging
* 21:33 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2726268]: Downgrade kafka_python to 1.4.3 (duration: 05m 45s)
* 21:27 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2726268]: Downgrade kafka_python to 1.4.3
* 20:53 cdanis@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:52 cdanis@cumin2001: START - Cookbook sre.hosts.decommission
* 20:26 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.23 (duration: 01m 03s)
* 20:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.23
* 20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:53 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:36 ejegg: updated payments-wiki from {{Gerrit|03765b53de}} to {{Gerrit|86ce0361f9}}
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 volans: temporary disabled puppet on A:dns-auth to deploy g/578506 [[phab:T233183|T233183]]
* 18:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 18:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgParsoidVariant, no longer read [[phab:T229015|T229015]] (duration: 01m 07s)
* 18:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop using wmgParsoidVariant, no longer varied [[phab:T229015|T229015]] (duration: 01m 08s)
* 17:53 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:53 moritzm: removed cas-2020-03-09.log and cas-2020-03-10.log on idp2001 (huge logs due to some debug log level for tracking down a performance issue)
* 16:36 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:25 liw: restarting Zuul to clear queues (in collab with James F)
* 14:49 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:41 volans: installed spicerack to 0.0.32-1 on cumin[12]001
* 14:25 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 11s)
* 14:24 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:23 akosiaris@deploy1001: sync aborted: wmf-config/ProductionServices.php (duration: 02m 42s)
* 14:22 volans: uploaded spicerack_0.0.32-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 14:21 akosiaris: switch mediawiki to talk to eventgate-analytics via envoy
* 14:21 akosiaris@deploy1001: Started scap: wmf-config/ProductionServices.php
* 14:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 akosiaris: [[phab:T239779|T239779]] upload apertium-swe-nor_0.3.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-swe-dan_0.8.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-nno-nob_1.3.0-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-dan-nor_1.4.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 13:01 thcipriani: restarting gerrit unstuck the zuul server ([[phab:T246973|T246973]])
* 12:54 thcipriani: restarting gerrit to try to fix thread deadlock on zuul (cf: [[phab:T246973|T246973]] )
* 12:43 akosiaris: disconnect+connect jenkins from gearman server.
* 12:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:23 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:23 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:00 Lucas_WMDE: EU SWAT done
* 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT (prod no-op): [[gerrit:578520{{!}}Don't use TwoColConflict as beta feature on labs (T247292)]], take II (duration: 01m 07s)
* 11:59 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT (prod no-op): [[gerrit:578520{{!}}Don't use TwoColConflict as beta feature on labs (T247292)]] (duration: 01m 09s)
* 11:56 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikibaseCirrusSearch/: SWAT: [[gerrit:578805{{!}}Wrap property EntitySearchHelper in PropertyDataTypeSearchHelper]] (duration: 01m 05s)
* 11:48 vgutierrez: restarting ats-backend on cp2004
* 11:25 moritzm: restarting slapd on serpens/seaborgium to pick up libidn security updates
* 11:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 11:16 _joe_: restarting zuul and zuul-merger on contint1001, they're stuck
* 11:11 moritzm: restarting exim on MXes to pick up libidn security updates
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Give normal 100 weight to es3 old masters - [[phab:T246072|T246072]]', diff saved to https://phabricator.wikimedia.org/P10685 and previous config saved to /var/cache/conftool/dbconfig/20200311-110334-marostegui.json
* 10:59 marostegui: Remove Mostrevisions from mwmaint1002 [[phab:T239072|T239072]]
* 10:42 vgutierrez: pool ncredir5002 - [[phab:T243391|T243391]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly give weight to es3 old masters - [[phab:T246072|T246072]]', diff saved to https://phabricator.wikimedia.org/P10684 and previous config saved to /var/cache/conftool/dbconfig/20200311-103802-marostegui.json
* 10:34 moritzm: restarting Apache on graphite*. kibana, netmon* to pick up libidn security updates
* 09:53 moritzm: installing postgresql-9.6 security updates on maps*
* 09:46 vgutierrez: depool and reimage ncredir5002 with buster - [[phab:T243391|T243391]]
* 09:43 marostegui: Finish es3 maintenance window [[phab:T246072|T246072]]
* 09:29 marostegui: Disconnect replication on all es3 hosts [[phab:T246072|T246072]]
* 09:18 marostegui: Set es1017 (es3 master) in read only on mysql [[phab:T246072|T246072]]
* 09:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set es3 as RO - [[phab:T246072|T246072]] (duration: 01m 08s)
* 09:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Set es3 as RO - [[phab:T246072|T246072]] (duration: 01m 08s)
* 09:01 moritzm: restarting Apache on puppetboard, people.wikimedia.org, webperf*, bromine, miscweb* to pick up libidn security updates
* 08:40 moritzm: installing libidn security updates
* 08:33 moritzm: installing libvpx security updates
* 08:10 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch wdqs-internal to use envoy (duration: 01m 21s)
* 07:38 marostegui: fixcopyrightwiki_p views from labs hosts [[phab:T246055|T246055]]
* 01:40 ejegg: restarted recurring donation charge jobs
* 01:27 ejegg: restarted fundraising orphan donation rectifier jobs
* 01:20 ejegg: updated fundraising CiviCRM from {{Gerrit|c4b81b19b0}} to {{Gerrit|a301076871}}
* 01:19 ejegg: disabled orphan rectifier jobs for upgrade
* 00:24 eileen: civicrm revision changed from {{Gerrit|35651da117}} to {{Gerrit|c4b81b19b0}}, config revision is {{Gerrit|71c8cda115}}
* 00:16 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2375.codfw.wmnet
* 00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw237[0246].codfw.wmnet
* 00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw236[68].codfw.wmnet
* 00:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw23[66-76].codfw.wmnet
 
== 2020-03-10 ==
* 23:53 volker-e@deploy1001: Finished deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:  (duration: 00m 05s)
* 23:53 volker-e@deploy1001: Started deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:
* 23:50 ejegg: disabled recurring donation charge jobs for upgrade
* 23:48 mutante: mw2376 - systemctl start apache2
* 23:45 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2376.codfw.wmnet
* 23:45 ebernhardson: start in-place reindex procedure on kowiki against eqiad and codfw
* 23:44 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: (no justification provided) (duration: 01m 07s)
* 23:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 23:38 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: cirrus: Wait around after a refresh before counting docs (duration: 01m 08s)
* 23:37 mutante: mw2366 - systemctl start nutcracker
* 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw237[135].codfw.wmnet
* 23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw236[579].codfw.wmnet
* 23:05 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|Ib5473af6}} (duration: 01m 07s)
* 23:02 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|Ib5473af6}} (duration: 01m 07s)
* 22:58 krinkle@deploy1001: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|Ib5473af6}} (duration: 01m 07s)
* 22:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw236[0-5].codfw.wmnet
* 22:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw235[0-9].codfw.wmnet
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw237[0-4].codfw.wmnet
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw236[0-9].codfw.wmnet
* 22:11 mutante: mw2359 sudo systemctl start php7.2-fpm_check_restart
* 22:09 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw235[0-9].codfw.wmnet
* 21:58 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@dda3d28]: re-sync latest version to trigger scap scripts on new elastic nodes in codfw (duration: 02m 15s)
* 21:56 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@dda3d28]: re-sync latest version to trigger scap scripts on new elastic nodes in codfw
* 21:51 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@dda3d28]: re-sync latest version to trigger scap scripts on new elastic nodes in codfw (duration: 00m 23s)
* 21:51 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@dda3d28]: re-sync latest version to trigger scap scripts on new elastic nodes in codfw
* 21:38 volker-e@deploy1001: Finished deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:  (duration: 00m 07s)
* 21:38 volker-e@deploy1001: Started deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.23
* 20:39 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.23 and rebuild l10n cache (duration: 163m 37s)
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 eileen: civicrm revision changed from {{Gerrit|10506a9644}} to {{Gerrit|3de711ed49}}, config revision is {{Gerrit|2d7b926c1d}}
* 20:10 eileen: process-control config revision is {{Gerrit|2d7b926c1d}}
* 19:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 19:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 19:38 mutante: gerrit1001 - /var/log/syslog empty and 2 rsyslogd procs running, killing one of them, stopping the other, letting puppet run
* 19:37 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:34 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:34 volker-e@deploy1001: Finished deploy [design/style-guide@62bf7c6]: Deploy design/style-guide:  (duration: 00m 06s)
* 19:34 volker-e@deploy1001: Started deploy [design/style-guide@62bf7c6]: Deploy design/style-guide:
* 19:32 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 19:31 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:30 brennen: scap-cdb-rebuild currently at 29%; at present rate wmf.23 will roll to group0 a bit after the official window
* 19:29 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:22 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 19:19 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 19:12 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:12 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 19:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:00 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:00 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:56 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 18:56 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 18:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 18:36 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:36 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:33 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:33 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:55 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.23 and rebuild l10n cache
* 17:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88b3e14]: Update predictions dag with new cli parameters (duration: 01m 00s)
* 17:33 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88b3e14]: Update predictions dag with new cli parameters
* 17:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 17:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [nlwiki] Enable WikiLove [[phab:T247286|T247286]] (duration: 00m 59s)
* 17:27 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@6c2ee13]: Update mobileapps to {{Gerrit|304fb43}} (duration: 08m 09s)
* 17:25 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:25 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:24 James_F: Ran mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=nlwiki wikilove for [[phab:T247286|T247286]]
* 17:23 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:23 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:19 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@6c2ee13]: Update mobileapps to {{Gerrit|304fb43}}
* 17:18 brennen: 1.35.0-wmf.23 was branched at {{Gerrit|8e3738cc2f0665d19c1ff758a1f16eebae0039dd}} for [[phab:T233871|T233871]]
* 16:50 brennen: starting branch cut for wmf/1.35.0-wmf.23 - [[phab:T233871|T233871]]
* 16:22 volker-e@deploy1001: Finished deploy [design/style-guide@14bb669]: Deploy design/style-guide:  (duration: 00m 08s)
* 16:21 volker-e@deploy1001: Started deploy [design/style-guide@14bb669]: Deploy design/style-guide:
* 16:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d182ca7]: Build airflow venvs from stat1007 (duration: 00m 45s)
* 16:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d182ca7]: Build airflow venvs from stat1007
* 16:05 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch termbox to use envoy (duration: 00m 59s)
* 15:48 vgutierrez: re-enabling session id based caching on ulsfo (along with tls session tickets) - [[phab:T245616|T245616]]
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10677 and previous config saved to /var/cache/conftool/dbconfig/20200310-144817-root.json
* 14:42 akosiaris: [[phab:T233700|T233700]] upload apertium-fra-cat_1.7.0-1+wmf1_amd64.changes to apt.wikimedia.org/jessie-wikimedia.org main
* 14:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=eventstreams,name=scb.*
* 14:35 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=eventstreams,name=scb.*
* 14:35 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=eventstreams,name=scb.*
* 14:34 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=eqiad,service=eventstreams,name=scb.*
* 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 vgutierrez: Switch to TLS session tickets on ulsfo - [[phab:T245616|T245616]]
* 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 vgutierrez: reboot cp4026 - [[phab:T245616|T245616]]
* 14:00 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch echotore to use envoy (duration: 00m 57s)
* 13:52 marostegui: Stop mysql on db2121 for reimage to buster [[phab:T246604|T246604]]
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10676 and previous config saved to /var/cache/conftool/dbconfig/20200310-134648-marostegui.json
* 13:45 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,service=eventstreams,name=kubernetes.*
* 13:44 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=eventstreams,name=kubernetes.*
* 13:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Mediawiki client side error logging on hawwiki (take 2) - [[phab:T246030|T246030]] (duration: 00m 57s)
* 13:40 akosiaris: bump eventstreams on scb1003 to force users to reconnect, hoping more connections will make it to kubernetes hosts
* 13:35 akosiaris: pool all kubernetes hosts in eqiad for eventstreams. weight=2 which means ~20% of requests are going to be served by kubernetes
* 13:34 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=eventstreams,name=kubernetes.*
* 13:34 akosiaris@cumin1001: conftool action : set/weight=2; selector: dc=eqiad,service=eventstreams,name=kubernetes.*
* 13:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Mediawiki client side error logging on hawwiki - [[phab:T246030|T246030]] (duration: 00m 58s)
* 13:29 akosiaris: [[phab:T202360|T202360]] upload apertium-oci-fra_0.3.0-1+wmf1_amd64.changes to apt.wikimedia.org/jessie-wikimedia main
* 13:25 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:23 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 13:23 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:16 vgutierrez: upgrade ATS on ulsfo to 8.0.6-1wm2 - [[phab:T245616|T245616]]
* 13:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:56 vgutierrez: upload trafficserver 8.0.6-1wm2 to apt.wm.o (buster) - [[phab:T245616|T245616]]
* 11:41 Lucas_WMDE: EU SWAT done
* 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/EventLogging/: SWAT: [[gerrit:578317{{!}}Make BackgroundQueue more aware of page unload flow (T246382, T244874)]] (duration: 00m 58s)
* 11:30 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
* 11:27 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
* 11:26 marostegui: Restart mysqld exporter on db2125 to see if the collection errors decrease from 30 [[phab:T247290|T247290]]
* 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/DiscussionTools/: SWAT: [[gerrit:578364{{!}}controller: apply ve.fixBase to the parsed Parsoid response (T245781)]] (duration: 00m 59s)
* 09:38 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 09:37 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 09:36 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 09:34 marostegui: es5 deployment window finished [[phab:T246072|T246072]]
* 09:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:30 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 09:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Enable es5 as new writable external store section - [[phab:T246072|T246072]] (duration: 00m 57s)
* 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Enable es5 as new writable external store section - [[phab:T246072|T246072]] (duration: 00m 58s)
* 09:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Enable es5 as new writable external store section - [[phab:T246072|T246072]] (duration: 00m 59s)
* 09:21 akosiaris: update blubberoid, cxserver, citoid to push the TLS resources changes [[phab:T244843|T244843]]
* 09:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 09:21 akosiaris: update blubberoid, cxserver, citoid to push the TLS resources changes
* 09:20 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 09:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 09:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add es5 to the available es sections, not in use yet - [[phab:T246072|T246072]] (duration: 00m 59s)
* 09:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add es5 to the available es sections, not in use yet - [[phab:T246072|T246072]] (duration: 01m 01s)
* 09:00 marostegui: Start es5 deployment window [[phab:T246072|T246072]]
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1012', diff saved to https://phabricator.wikimedia.org/P10673 and previous config saved to /var/cache/conftool/dbconfig/20200310-085001-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 back to es1 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10671 and previous config saved to /var/cache/conftool/dbconfig/20200310-082552-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1012', diff saved to https://phabricator.wikimedia.org/P10670 and previous config saved to /var/cache/conftool/dbconfig/20200310-082525-marostegui.json
* 05:36 vgutierrez: restart ats-be on cp4032 to clean up the restart alert - [[phab:T247232|T247232]]
 
== 2020-03-09 ==
* 23:21 catrope@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle exemptions (duration: 01m 00s)
* 23:15 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create Define/Define talk: namespace on scowiki (duration: 01m 00s)
* 16:20 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: revert: switch eventgate-analytics to use envoy (duration: 00m 59s)
* 16:15 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch eventgate-analytics to use envoy (duration: 01m 05s)
* 16:11 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1012', diff saved to https://phabricator.wikimedia.org/P10668 and previous config saved to /var/cache/conftool/dbconfig/20200309-154627-marostegui.json
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1012', diff saved to https://phabricator.wikimedia.org/P10667 and previous config saved to /var/cache/conftool/dbconfig/20200309-153515-marostegui.json
* 15:29 marostegui: Upgrade mysql on es1012 [[phab:T239791|T239791]]
* 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2125 - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10666 and previous config saved to /var/cache/conftool/dbconfig/20200309-152427-marostegui.json
* 15:18 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch mathoid to use envoy (duration: 00m 59s)
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10665 and previous config saved to /var/cache/conftool/dbconfig/20200309-151751-marostegui.json
* 15:14 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1016 to es1 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10664 and previous config saved to /var/cache/conftool/dbconfig/20200309-151310-marostegui.json
* 15:13 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:12 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:06 marostegui: Restart mysql on db1116 (the previous one was db1102) for upgrade
* 14:57 marostegui: Restart mysql for upgrade
* 14:56 hoo: Updated the Wikidata property suggester with data from the 2020-03-02 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10663 and previous config saved to /var/cache/conftool/dbconfig/20200309-145232-marostegui.json
* 14:48 marostegui: Restart and upgrade mysql on db1121 [[phab:T239791|T239791]]
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10662 and previous config saved to /var/cache/conftool/dbconfig/20200309-144752-marostegui.json
* 14:41 godog: roll restart logstash in codfw / eqiad - [[phab:T226986|T226986]]
* 13:52 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 13:49 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 12:30 akosiaris: upload apertium 3.6.1, cg3 1.3.1, lttoolbox 3.5.1, apertium-lex-tools 0.2.3 to apt.wikimedia.org/jessie-wikimedia main. [[phab:T234182|T234182]]
* 12:06 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch sessionstore to use envoy permanently (duration: 00m 59s)
* 11:25 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: test: switch sessionstore to use envoy again (duration: 00m 57s)
* 11:10 Amir1: EU SWAT is done
* 11:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:577671{{!}}Add `fkv` Kven to $wmgExtraLanguageNames (T167259)]], take II (duration: 00m 57s)
* 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:577671{{!}}Add `fkv` Kven to $wmgExtraLanguageNames (T167259)]] (duration: 00m 59s)
* 10:58 vgutierrez: upload pystemd 0.7.0-1wm1 to apt.wm.o (buster) - [[phab:T245616|T245616]]
* 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:578305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:578305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:34 moritzm: install spamassassin security updates on fermium/lists.wikimedia.org
* 10:32 moritzm: install spamassassin security updates on mendelevium/ticket.wikimedia.org
* 10:26 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 10:12 moritzm: installing openjdk-7 security updates
* 10:04 vgutierrez: disable parent proxies globally on ats-tls - [[phab:T244464|T244464]]
* 10:00 moritzm: installing php5 security updates
* 09:51 gehel: pooling new elastic20[55-60] servers - [[phab:T246975|T246975]]
* 09:48 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: re-revert: switch sessionstore to use envoy (duration: 00m 35s)
* 09:39 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: re-try: switch sessionstore to use envoy (duration: 00m 58s)
* 09:14 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: revert switch sessionstore to use envoy (duration: 00m 58s)
* 09:08 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch sessionstore to use envoy (duration: 01m 00s)
* 09:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10658 and previous config saved to /var/cache/conftool/dbconfig/20200309-083711-marostegui.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2126', diff saved to https://phabricator.wikimedia.org/P10657 and previous config saved to /var/cache/conftool/dbconfig/20200309-083653-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2126 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10656 and previous config saved to /var/cache/conftool/dbconfig/20200309-082118-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2114 after reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10655 and previous config saved to /var/cache/conftool/dbconfig/20200309-074629-marostegui.json
* 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:13 marostegui: Stop MySQL on db2114 to upgrade to buster
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10654 and previous config saved to /var/cache/conftool/dbconfig/20200309-070937-marostegui.json
* 05:34 vgutierrez: restart ats-tls, ats-be and varnish-fe on cp3053 to clean up daemon restart alerts - [[phab:T247195|T247195]]
 
== 2020-03-08 ==
* 17:58 elukey: restart hadoop-yarn-nodemanger on an-worker1087
* 17:17 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wmgDisableAccountCreation (duration: 00m 56s)
* 17:15 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgDisableAccountCreation (duration: 00m 59s)
* 05:16 thcipriani: restart gerrit-replica as it's OOM [[phab:T247182|T247182]]
 
== 2020-03-07 ==
* 12:48 reedy@deploy1001: Synchronized wmf-config/throttle.php: [[phab:T247149|T247149]] (duration: 01m 07s)
* 01:35 reedy@deploy1001: Synchronized wmf-config/throttle.php: tidy up (duration: 00m 56s)
 
== 2020-03-06 ==
* 23:50 mutante: install1003/2003 - starting DHCP servers and letting puppet stop them again to clear systemd state
* 23:04 mutante: signing puppet certs for install1003/install2003, initial puppet runs
* 22:33 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: [[phab:T247091|T247091]] (duration: 00m 57s)
* 22:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@18f13e4]: update to pyhton3.7, ship articletopic propagation (duration: 00m 36s)
* 22:08 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@18f13e4]: update to pyhton3.7, ship articletopic propagation
* 20:23 ebernhardson: post-deploy restart mjolnir bulk and msearch daemons across eqiad and codfw
* 20:07 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@dda3d28]: Re-deploy python3.7 upgrade (duration: 05m 14s)
* 20:02 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@dda3d28]: Re-deploy python3.7 upgrade
* 19:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:48 mutante: re-creating install1003 and install2003 with same specs as before but public IP ([[phab:T244390|T244390]])
* 19:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:07 mutante: sudo -i cumin -b 15 'mw23[25-34].codfw.wmnet' 'sudo -u dzahn scap pull'
* 18:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw233[0-4].codfw.wmnet
* 18:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw232[5-9].codfw.wmnet
* 18:05 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[0-4].codfw.wmnet
* 18:04 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw232[5-9].codfw.wmnet
* 17:42 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|I260bafdb8e}} (no-op) (duration: 01m 00s)
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:54 reedy@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/WikimediaMaintenance/dumpInterwiki.php: [[phab:T247097|T247097]] (duration: 01m 00s)
* 16:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 moritzm: installing libtimedate-perl updates from Stretch point release
* 15:07 reedy@deploy1001: Synchronized langlist-labs: [[phab:T247091|T247091]] (duration: 01m 05s)
* 14:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 14:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 14:44 XioNoX: add cloud-out4 firewall filter in codfw - [[phab:T246887|T246887]]
* 11:56 akosiaris: [[phab:T238658|T238658]]. kubernetes1001 pooled for eventstreams, weight=1 which should account for 2.1% of traffic
* 11:51 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=eventstreams,name=kubernetes1001.*
* 11:50 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=eventstreams,name=kube.*
* 10:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 10:10 moritzm: rolling restart of Exim on mx* to pick up libidn security updates
* 10:06 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074', diff saved to https://phabricator.wikimedia.org/P10648 and previous config saved to /var/cache/conftool/dbconfig/20200306-100628-marostegui.json
* 10:03 moritzm: rolling restart of labweb* to pick up libidn security updates
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314, db2084:3315 after reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10647 and previous config saved to /var/cache/conftool/dbconfig/20200306-095407-marostegui.json
* 09:52 moritzm: rolling restart of slapd on LDAP replicas to pick up libidn security updates
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P10646 and previous config saved to /var/cache/conftool/dbconfig/20200306-095115-marostegui.json
* 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 09:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:43 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 09:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:21 marostegui: Stop MySQL on db2084:3315, db2084:3314 for reimage [[phab:T246604|T246604]]
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3314, db2084:3315 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10645 and previous config saved to /var/cache/conftool/dbconfig/20200306-092103-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P10644 and previous config saved to /var/cache/conftool/dbconfig/20200306-092026-marostegui.json
* 09:12 moritzm: rolling restart of mw canaries to pick up libidn security updates
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P10643 and previous config saved to /var/cache/conftool/dbconfig/20200306-090328-marostegui.json
* 09:00 moritzm: installing libidn security updates
* 08:56 moritzm: rolling restart of kartotherian/tilerator/tileratorui to pick up OpenJPEG security updates
* 08:56 marostegui: Stop MySQL on db1074 for upgrade [[phab:T239791|T239791]]
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for upgrade [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10642 and previous config saved to /var/cache/conftool/dbconfig/20200306-085435-marostegui.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315, db1113:3316 after upgrade - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10641 and previous config saved to /var/cache/conftool/dbconfig/20200306-085332-marostegui.json
* 08:47 marostegui: Stop mysql for db1113:3315, db1113:3316 for upgrade [[phab:T239791|T239791]]
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315, db1113:3316 for upgrade - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10640 and previous config saved to /var/cache/conftool/dbconfig/20200306-084439-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078 [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10639 and previous config saved to /var/cache/conftool/dbconfig/20200306-084141-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311, db2085:3318 after reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10638 and previous config saved to /var/cache/conftool/dbconfig/20200306-082858-marostegui.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10637 and previous config saved to /var/cache/conftool/dbconfig/20200306-082549-marostegui.json
* 08:19 moritzm: installing openjpeg2 security updates
* 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:50 marostegui: Stop MySQL on db2085:3311, db2085:3318 for reimage to buster [[phab:T246604|T246604]]
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311, db2085:3318 for reimage to buster - [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10636 and previous config saved to /var/cache/conftool/dbconfig/20200306-074427-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10635 and previous config saved to /var/cache/conftool/dbconfig/20200306-073707-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10634 and previous config saved to /var/cache/conftool/dbconfig/20200306-070538-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Install 10.4 instead of 10.3 on db1078', diff saved to https://phabricator.wikimedia.org/P10633 and previous config saved to /var/cache/conftool/dbconfig/20200306-064800-marostegui.json
* 01:38 mutante: added 9 more appservers to codfw pool split between appserver and API appservers, weight 15 (like all in codfw) [[phab:T247021|T247021]]
* 01:37 mutante: added 9 more appservers to codfw pool
* 01:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw230[1-9].codfw.wmnet
* 01:34 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw230[1-9].codfw.wmnet
* 01:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:33 cdanis: repool esams [[phab:T246338|T246338]]
* 00:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:02 cdanis: [[phab:T246338|T246338]] depool esams for router maintenance
 
== 2020-03-05 ==
* 23:55 mutante: pooled mw2290 - noticed it was the only API appserver in codfw not pooled but did not see why, fine in Icinga and no open tickets/SAL
* 23:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet
* 23:30 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 23:27 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw1413.eqiad.wmnet
* 23:26 rlazarus: mw1413 test-reimage completed successfully, pooling
* 23:03 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:01 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 22:50 mutante: added 8 new appservers to pool in eqiad
* 22:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw139[0-2].eqiad.wmnet
* 22:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[5-9].eqiad.wmnet
* 22:47 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw138[5-9].eqiad.wmnet
* 22:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw139[0-2].eqiad.wmnet
* 22:46 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw139[0-2].eqiad.wmnet
* 22:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[5-9]eqiad.wmnet
* 22:42 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw139[0-2]eqiad.wmnet
* 22:41 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw138[5-9]eqiad.wmnet
* 22:41 rlazarus: reimaging mw1413 (new appserver, not pooled) to test https://gerrit.wikimedia.org/r/c/576464
* 22:40 mutante: [cumin1001:~] $ sudo -i cumin -b 15 'mw13[85-92].eqiad.wmnet' 'sudo -u dzahn scap pull'
* 22:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(0[5-9]{{!}}1[0-2]).eqiad.wmnet
* 22:40 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw14(0[5-9]{{!}}1[0-2]).eqiad.wmnet
* 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [ukwikinews] Add HD logos (duration: 00m 59s)
* 22:35 eileen: civicrm revision changed from {{Gerrit|62e62e107c}} to {{Gerrit|10506a9644}}, config revision is {{Gerrit|734a7bfadd}}
* 22:34 jforrester@deploy1001: Synchronized static/images/project-logos/: [ukwikinews] Provide HD logos (duration: 00m 59s)
* 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 56s)
* 22:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [fawikivoyage] Add custom logos (duration: 00m 58s)
* 22:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 59s)
* 22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use HD logos at arwikibooks, cawikibooks, and plwikivoyage (duration: 00m 59s)
* 22:17 jforrester@deploy1001: Synchronized static/images/project-logos/: Provide HD logos for arwikibooks, cawikibooks, and plwikivoyage (duration: 01m 00s)
* 22:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 59s)
* 22:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use HD logos at bnwikibooks, bnwikisource, and ukwikivoyage (duration: 00m 59s)
* 22:10 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:09 jforrester@deploy1001: Synchronized static/images/project-logos/: Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage (duration: 01m 00s)
* 22:07 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 22:05 rzl@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97)
* 22:05 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 22:01 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Stop loading four old logo dblists (duration: 00m 59s)
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:40 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 20:39 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update label blacklist (once more for good measure) (duration: 00m 57s)
* 20:37 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update label blacklist (duration: 00m 59s)
* 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync for bug 236104 (duration: 00m 56s)
* 19:45 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:577325{{!}}Switch GrowthExperiments topic search to ORES (T240517)]] (duration: 00m 58s)
* 19:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:10 ebernhardson@deploy1001: Synchronized wmf-config/SearchSettingsForWikibase.php: (no justification provided) (duration: 00m 57s)
* 18:33 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:32 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:30 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:30 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:28 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:27 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:27 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:26 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:02 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:40 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:37 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I8f0d82164}}, {{Gerrit|Iaac7cbfbb9}} (no-op) (duration: 00m 59s)
* 17:32 elukey: run homer on cumin1001 to apply https://gerrit.wikimedia.org/r/576873 on cr1/cr2-eqiad
* 17:27 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:27 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:24 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:24 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:58 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:58 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078 after reimage to buster [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10631 and previous config saved to /var/cache/conftool/dbconfig/20200305-165555-marostegui.json
* 16:55 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:50 krinkle@deploy1001: Synchronized dblists/: {{Gerrit|I22a3c2a82f7be4a}} (duration: 00m 57s)
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10630 and previous config saved to /var/cache/conftool/dbconfig/20200305-164319-marostegui.json
* 16:22 marostegui: Restart tendril/dbtree database
* 16:18 _joe_: repooling mw1394
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10629 and previous config saved to /var/cache/conftool/dbconfig/20200305-161222-marostegui.json
* 16:01 elukey: depool mw1394
* 16:01 Krinkle: mw1394 (api_appserver) is fatalling search-related api requests due to "Elastic down?"
* 15:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:26 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:26 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:24 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:24 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster [[phab:T246604|T246604]]', diff saved to https://phabricator.wikimedia.org/P10627 and previous config saved to /var/cache/conftool/dbconfig/20200305-151858-marostegui.json
* 15:18 _joe_: fixing the envoy installation on mw1394-1404, running scap pull
* 15:15 XioNoX: add SNMP community to Juniper devices
* 15:01 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:01 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:55 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:55 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:52 moritzm: copied hpssacli to thirdparty/hwraid for buster-wikimedia (current Gen 10 releases are named ssaducli now, but retain the old package (which only uses libc anyway) for backwards compat with gen9 on Buster)