You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ejegg: updated civicrm from 1f454aa69a to 2802bdd649)
imported>Stashbot
(James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.8/extensions/Translate/sql/translate_{…}.sql T239091)
Line 1: Line 1:
== 2019-12-03 ==
* 01:00 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.8/extensions/Translate/sql/translate_<nowiki>{</nowiki>…<nowiki>}</nowiki>.sql [[phab:T239091|T239091]]
* 00:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T239091|T239091]] Enable Translate extension on sewikimedia (duration: 00m 57s)
* 00:54 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.5/extensions/Wikibase/client/sql/entity_usage.sql
* 00:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Echo/includes/DiscussionParser.php: [[phab:T239275|T239275]] Fix type hint fatal from getUserLinks() (duration: 01m 16s)
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
== 2019-12-02 ==
* 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
* 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
* 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
* 23:05 mutante: mw2248 - restart nginx (for some reason unit was running but not listening on 443 after reimage..now it does)
* 23:05 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:02 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 22:46 ejegg: updated payments-wiki from {{Gerrit|06a8c3cdff}} to {{Gerrit|f61c9f0692}}
* 22:44 bblack: reimaging dns4002 to buster - [[phab:T239667|T239667]]
* 22:07 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Update text for no personal uploads message ([[phab:T238873|T238873]]) (duration: 01m 03s)
* 22:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
* 21:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
* 21:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
* 21:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 21:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 21:22 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P9796 and previous config saved to /var/cache/conftool/dbconfig/20191202-205904-marostegui.json
* 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=nginx,dc=codfw
* 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=apache2,dc=codfw
* 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=nginx,cluster=appserver
* 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=apache2,cluster=appserver
* 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=nginx,cluster=appserver,dc=codfw
* 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=apache2,cluster=appserver,dc=codfw
* 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=nginx,dc=codfw
* 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=apache2,dc=codfw
* 20:36 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch Flow on all wikis to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 00m 59s)
* 20:35 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 20:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 20:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 14m 59s)
* 20:12 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2) (duration: 00m 05s)
* 20:12 joal@deploy1001: Started deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2)
* 20:08 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2) (duration: 08m 08s)
* 20:06 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - [[phab:T229015|T229015]]
* 20:05 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labslabslabs (duration: 01m 08s)
* 20:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP (duration: 02m 48s)
* 20:02 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP
* 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:59 joal@deploy1001: Started deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2)
* 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:56 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:51 ariel@cumin1001: START - Cookbook sre.hosts.downtime
* 19:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:50 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:50 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
* 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
* 19:48 ariel@cumin1001: START - Cookbook sre.hosts.downtime
* 19:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 13m 48s)
* 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:23 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - [[phab:T229015|T229015]]
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP (duration: 06m 38s)
* 19:16 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP
* 19:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - [[phab:T229015|T229015]] (duration: 14m 11s)
* 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 18:50 mobrovac@deploy1001: Started deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - [[phab:T229015|T229015]]
* 18:39 joal@deploy1001: Finished deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy (duration: 00m 06s)
* 18:39 joal@deploy1001: Started deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy
* 18:38 joal@deploy1001: Finished deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy (duration: 08m 21s)
* 18:32 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 18:30 joal@deploy1001: Started deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy
* 18:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes (duration: 15m 42s)
* 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:00 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes
* 17:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - [[phab:T229015|T229015]] (duration: 14m 06s)
* 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 17:42 mobrovac@deploy1001: Started deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - [[phab:T229015|T229015]]
* 17:29 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning [[phab:T230495|T230495]] (duration: 01m 14s)
* 17:28 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning [[phab:T230495|T230495]]
* 17:21 ssastry@deploy1001: Finished deploy [parsoid/deploy@743efb0]: Updating Parsoid to {{Gerrit|ca588b25}} + fix broken langconv library / deploy (duration: 07m 48s)
* 17:14 ssastry@deploy1001: Started deploy [parsoid/deploy@743efb0]: Updating Parsoid to {{Gerrit|ca588b25}} + fix broken langconv library / deploy
* 17:09 ejegg: disabled fundraising job omnimail_groupmember_load
* 16:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:43 ejegg: updated fundraising internal dashboard from {{Gerrit|8fc2726736}} to {{Gerrit|3a93d2aba4}}
* 16:43 effie: restart all API cluster in eqiad
* 16:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 hashar: Restarted CI Jenkins
* 16:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 13m 53s)
* 16:41 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=global [[phab:T238494|T238494]]
* 16:32 ema: cp3053: repooling after firmware update [[phab:T239041|T239041]]
* 16:27 mobrovac@deploy1001: Started deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - [[phab:T229015|T229015]]
* 16:19 effie: reimage mw1295.eqiad.wmnet mw1294.eqiad.wmnet  mw1293.eqiad.wmnet
* 16:11 robh: cp3053 depooling and rebooting for firmware update [[phab:T239041|T239041]]
* 16:10 robh: cp3035 depooling and rebooting for firmware update [[phab:T239041|T239041]]
* 15:38 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 15:38 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid VRS: Switch groups 0 and 1 to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 00m 59s)
* 15:35 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - [[phab:T239607|T239607]] (duration: 14m 51s)
* 15:26 effie: Rolling restart mw1345-1348
* 15:15 mobrovac@deploy1001: Started deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - [[phab:T239607|T239607]]
* 14:46 ema: cp-ats: set server_session_sharing.match=2 everywhere (puppet re-enable and run) [[phab:T238494|T238494]]
* 14:31 ema: cp-ats: merge server_session_sharing.match=2 (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553490/) with puppet disabled, test on cp3050 [[phab:T238494|T238494]]
* 14:18 godog: set grafana theme back to light, was dark for some reason
* 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P9794 and previous config saved to /var/cache/conftool/dbconfig/20191202-135643-marostegui.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P9793 and previous config saved to /var/cache/conftool/dbconfig/20191202-135543-marostegui.json
* 13:47 ema: power-cycle cp3053 [[phab:T239041|T239041]]
* 13:44 hashar: Restarted CI Jenkins
* 13:30 hashar: Restarted CI Jenkins
* 13:14 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - [[phab:T229015|T229015]] (duration: 14m 49s)
* 13:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - [[phab:T229015|T229015]]
* 12:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes (duration: 02m 54s)
* 12:54 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes
* 12:54 Urbanecm: EU SWAT done
* 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|d27fe78}}: Enable partial blocks on eswiki ([[phab:T239370|T239370]]) (duration: 01m 00s)
* 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|445bdc3}}: Remove `move-rootuserpages` from user on svwiki ([[phab:T238842|T238842]]) (duration: 01m 04s)
* 12:43 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki*.png
* 12:39 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|61a9563}}: Revert "Change bawiki logo to an anniversary one" ([[phab:T237070|T237070]]) (duration: 01m 06s)
* 12:37 effie: reimage mw1296.eqiad.wmnet
* 12:37 effie: reimage mw1298.eqiad.wmnet
* 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:554049{{!}}Set read new for term store for items of wikidata up to Q1000 (T225057)]] (duration: 01m 00s)
* 12:19 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/GrowthExperiments/: SWAT: [[gerrit:553402{{!}}Suggested edits: do not treat AQS lookup failure as error (T238178)]] (duration: 01m 02s)
* 11:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:50 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:554033{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:554033{{!}} Bumping portals to master (T128546)]] (duration: 01m 04s)
* 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 moritzm: installing ruby2.1 security updates
* 10:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:43 moritzm: installing python-psutil security updates
* 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:42 effie: reimage mw1299.eqiad.wmnet
* 10:18 effie: reimage mw1290.eqiad.wmnet
* 10:18 effie: reimage  mw1275.eqiad.wmnet
* 10:15 moritzm: installing file/libmagic regresssion update for jessie
* 10:08 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
* 09:52 godog: swift eqiad-prod: more weight to ms-be105[7-9] - [[phab:T237438|T237438]]
* 09:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:41 joal@deploy1001: Finished deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin) (duration: 00m 08s)
* 09:41 joal@deploy1001: Started deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin)
* 09:40 joal@deploy1001: Finished deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week (duration: 18m 22s)
* 09:23 effie: reimage mw1300.eqiad.wmnet
* 09:23 effie: reimage mw1300.eqiad.wmne
* 09:22 joal@deploy1001: Started deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week
* 09:16 moritzm: installing libvpx security updates
* 09:14 godog: extend graphite LVs on graphite1004 / graphite2003 by 200G
* 08:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 effie: reimage mw1287.eqiad.wmnet mw1288.eqiad.wmnet mw1289.eqiad.wmnet
* 08:08 effie: reimage mw1301.eqiad.wmnet
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:18 andrewbogott: forcing a reboot of cloudstore1008 via mgmt console — it seems to have locked up
* 06:43 Urbanecm: Clear account creation throttle for several IPs ([[phab:T239465|T239465]])
* 06:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for cawiki workshop ([[phab:T239465|T239465]]) (duration: 01m 03s)
* 06:00 marostegui: Compress s8 codfw master (lag might appear on codfw s8)
* 06:00 marostegui: Compress s4 codfw master (lag might appear on codfw s4)
* 05:56 marostegui: Deploy schema change on db1075
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P9791 and previous config saved to /var/cache/conftool/dbconfig/20191202-055546-marostegui.json
* 05:53 marostegui: Compress db1099:3318 [[phab:T235599|T235599]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for compression', diff saved to https://phabricator.wikimedia.org/P9790 and previous config saved to /var/cache/conftool/dbconfig/20191202-055245-marostegui.json
== 2019-12-01 ==
* 23:27 ladsgroup@deploy1001: Started restart [mobileapps/deploy@70154b4]: Rolling restart of mobileapps
* 23:20 bblack: restarting AQS services in eqiad
* 23:15 eileen: process-control config revision is {{Gerrit|9750c318a0}} - jobs disabled
* 21:39 andrewbogott: restarted nova conductor and api on cloudcontrol1003 and 1004 to free up db connections ([[phab:T239168|T239168]])
== 2019-11-30 ==
* 15:47 Urbanecm: Reset email of SUL user Hayk.arabaget ([[phab:T239462|T239462]])
* 07:40 vgutierrez: repooling cp3057 - [[phab:T239502|T239502]]
* 07:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
* 07:30 vgutierrez: depool and powercycle cp3057 - [[phab:T239502|T239502]]
== 2019-11-29 ==
* 22:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:12 effie: reimage  mw1302.eqiad.wmnet
* 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 19:19 effie: reimage mw1284.eqiad.wmnet
* 19:19 effie: reimage mw1303.eqiad.wmnet mw1283.eqiad.wmnet
* 17:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
* 16:17 effie: reimage mw1274.eqiad.wmnet
* 16:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 effie: reimage mw1282.eqiad.wmnet
* 14:45 effie: reimage mw1282.eqiad.wmne
* 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:36 effie: reimage mw1323.eqiad.wmnet mw1297.eqiad.wmnet mw1273.eqiad.wmnet
* 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:14 filippo@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
* 14:13 godog: reimage mw2228 for partman tests
* 14:02 effie: reimage mw1271.eqiad.wmnet mw1272.eqiad.wmnet mw1304.eqiad.wmnet
* 13:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 jynus: reenable puppet on dbprov2001, backup1001
* 13:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:48 jynus: disabling puppet also on on backup1001 to test recoveries
* 12:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 effie: reimage mw1305.eqiad.wmnet mw1265.eqiad.wmnet mw1270.eqiad.wmnet
* 11:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:39 jynus: disabling puppet on dbprov2001 to test recoveries
* 11:34 effie: reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet  mw1281.eqiad.wmnet
* 11:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 Lucas_WMDE: <effie> 10:58:17 log reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmne
* 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:47 elukey@deploy1001: Finished deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts (duration: 00m 08s)
* 10:47 elukey@deploy1001: Started deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts
* 10:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:22 effie: reimage mw1306.eqiad.wmnet mw1264.eqiad.wmnet mw1279.eqiad.wmnet
* 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:33 marostegui: Remove triggers from db2094:3313 - [[phab:T234704|T234704]]
* 09:33 marostegui: Stop replication on db2105 (s3 codfw) for schema change
* 09:23 effie: reimage mw1263.eqiad.wmnet mw1307.eqiad.wmnet
* 09:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:01 volans: temporary disabling puppet on 'R:keyholder::agent' to merge gerrit:operations/puppet/+/553460 - [[phab:T239386|T239386]]
* 09:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:18 effie: reimage mw2223.codfw.wmnet  mw2222.codfw.wmnet mw2221.codfw.wmnet  mw2220.codfw.wmnet
* 07:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 07:25 effie: reimage mw1312.eqiad.wmnet mw1308.eqiad.wmnet  mw1261.eqiad.wmnet
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P9781 and previous config saved to /var/cache/conftool/dbconfig/20191129-055845-marostegui.json
* 05:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.5/includes/exception/MWExceptionHandler.php: {{Gerrit|532f4aba96d85}} (duration: 01m 03s)
== 2019-11-28 ==
* 23:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 23:21 effie: reimage mw1329.eqiad.wmnet
* 23:01 effie: restart cp1087
* 22:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 22:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 22:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 21:19 effie: reimage mw1309.eqiad.wmnet
* 21:19 effie: reimage mw1323.eqiad.wmnet
* 21:11 effie: reimage  mw1316.eqiad.wmnet  mw1315.eqiad.wmnet
* 20:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 20:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 20:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:03 effie: reimage mw1313.eqiad.wmnet
* 20:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 19:48 effie: reimage mw1331.eqiad.wmnet mw1330.eqiad.wmnet mw1310.eqiad.wmnet
* 18:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 18:41 marostegui: Deploy schema change on db1134
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P9780 and previous config saved to /var/cache/conftool/dbconfig/20191128-183918-marostegui.json
* 18:29 effie: reimage w1319.eqiad.wmnet  mw1318.eqiad.wmnet
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P9779 and previous config saved to /var/cache/conftool/dbconfig/20191128-180517-marostegui.json
* 17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 17:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 17:19 effie: reimage mw1340.eqiad.wmnet mw1339.eqiad.wmnet
* 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:32 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:18 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:58 effie: reimage mw1311.eqiad.wmnet
* 15:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:28 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 effie: reimage mw1333.eqiad.wmnet mw1332.eqiad.wmnet mw1331.eqiad.wmnet
* 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 effie: reimage mw1343.eqiad.wmnet mw1342.eqiad.wmnet  mw1341.eqiad.wmnet
* 14:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:20 marostegui: Deploy schema change on s3 codfw on the master, lag will appear on s3 codfw  ([[phab:T234066|T234066]])
* 13:57 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 5 ([[phab:T237984|T237984]])
* 13:57 marostegui: Deploy schema change on s4 codfw master with replication - [[phab:T234066|T234066]]
* 13:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:37 marostegui: Deploy schema change on db1106 with replication (lag will appear on s1 on labs) -  [[phab:T234066|T234066]] [[phab:T233135|T233135]]
* 13:37 marostegui: Recreate views for enwiki_p.protected_titles for all labsdb hosts - [[phab:T233135|T233135]]
* 13:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:33 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
* 13:31 marostegui: Remove ar_comment triggers from db1124:3311 for enwiki.archive - [[phab:T234704|T234704]]
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change, temporarily pool db1080 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9778 and previous config saved to /var/cache/conftool/dbconfig/20191128-133013-marostegui.json
* 13:28 volans: cleanup root's crontab entries on netmon hosts from netbox/postres stuff -  [[phab:T238919|T238919]]
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P9777 and previous config saved to /var/cache/conftool/dbconfig/20191128-132647-marostegui.json
* 13:21 volans: cumin 'netmon*' 'rm -v /var/spool/cron/crontabs/postgres' [[phab:T238919|T238919]]
* 13:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:15 effie: enable puppet on thumbor*
* 13:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:51 effie: disable puppet on thumbor*
* 12:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:59 effie: reimage mw1267.eqiad.wmnet mw1277.eqiad.wmnet
* 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:36 effie: reimage mw1344.eqiad.wmnet mw1334.eqiad.wmnet mw1324.eqiad.wmnet
* 11:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:55 effie: reimage mw2279 mw2278  mw2277 mw2276 mw2275
* 10:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:39 marostegui: Compress labsdb1009
* 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:51 godog: swift eqiad-prod: more weight to ms-be105[7-9] - [[phab:T237438|T237438]]
* 09:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:17 effie: reimage mw1266, mw1276
* 09:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:56 marostegui: Compress labsdb1011
* 08:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 08:19 marostegui: Remove m4 from tendril and zarcillo - [[phab:T159170|T159170]]
* 08:15 effie: reimage mw2280, mw2281, mw2282
* 08:06 marostegui: Compress labsdb1012
* 07:56 effie: reimage mw1345, mw1335, mw1325
* 06:56 elukey: remove log files on an-tool1007 to free root partition space
* 06:14 marostegui: Remove db1061 from tendril and zarcillo - [[phab:T238624|T238624]]
* 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:02 marostegui: Remove db2067 from tendril and zarcillo [[phab:T233185|T233185]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P9776 and previous config saved to /var/cache/conftool/dbconfig/20191128-055212-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P9775 and previous config saved to /var/cache/conftool/dbconfig/20191128-055025-marostegui.json
* 03:03 vgutierrez: restarting keyholder on acmechief[12]001
* 01:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:59 mutante: mw2244 restart php-fpm and apache which somehow are returning 5xx after reimage
* 00:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
== 2019-11-27 ==
* 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 mutante: mw2215 scap pull
* 21:30 mutante: mw2215 rebooting
* 21:10 bblack: restarting acme-chief service on acmechief1001 (daemon appears to be stuck on a lock and nonfunctional for days...)
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:14 cstone: payments-wiki revision changed from {{Gerrit|2eb54fd6ef}} to {{Gerrit|06a8c3cdff}}
* 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P9773 and previous config saved to /var/cache/conftool/dbconfig/20191127-193528-marostegui.json
* 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P9772 and previous config saved to /var/cache/conftool/dbconfig/20191127-193227-marostegui.json
* 19:32 ebernhardson@deploy1001: Finished deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient (duration: 00m 45s)
* 19:31 ebernhardson@deploy1001: Started deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient
* 19:27 mutante: an-airflow1001 - apt-get install python3-mysqldb - start airflow-webserver
* 19:24 ebernhardson@deploy1001: Finished deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package (duration: 00m 42s)
* 19:23 ebernhardson@deploy1001: Started deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package
* 19:08 ebernhardson@deploy1001: Finished deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance (duration: 00m 40s)
* 19:08 ebernhardson@deploy1001: Started deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance
* 19:00 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg (it tries to write this on first start and did not have permissions to do so) [[phab:T236180|T236180]]
* 18:58 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg
* 18:57 eileen: process-control config revision is {{Gerrit|b95355c0c0}} - repair omnirecipient job off
* 16:57 andrewbogott: disabling puppet on clouvirt* and cloudcontrol* while merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552894/
* 16:50 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external
* 16:32 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: {{Gerrit|dd4c76d3d}} SpecialContributions: max concurrency 3 (instead of 10) [[phab:T234450|T234450]] (duration: 01m 17s)
* 16:22 ejegg: shifted daily silverpop export start time one hour earlier
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P9768 and previous config saved to /var/cache/conftool/dbconfig/20191127-161525-marostegui.json
* 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P9767 and previous config saved to /var/cache/conftool/dbconfig/20191127-161450-marostegui.json
* 16:06 ema: cp3050: set proxy.config.http.server_session_sharing.match to "ip" [[phab:T238494|T238494]]
* 15:57 _joe_: restarting pybal on lvs1015
* 15:56 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:55 _joe_: restarting pybal on lvs1016
* 15:52 jynus: disabling puppet on dbprov1001 to test bacula restore [[phab:T238048|T238048]]
* 15:47 papaul: testing redundancy power on scs-a1-codfw
* 15:47 _joe_: restarting pybal on lvs2003
* 15:44 _joe_: restarting pybal again on lvs2006
* 15:42 jynus: migrate db entries of archive Media to backup1001 [[phab:T238048|T238048]]
* 15:37 marostegui: Logging retroactively for the record: drop user 'nova'@'%' from m5 - [[phab:T239170|T239170]]
* 15:30 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 marostegui: Add grants for dump (10.192.0.114,10.192.16.96) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - [[phab:T239170|T239170]]
* 15:27 marostegui: Add grants for dump (10.64.0.95,10.64.16.31) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - [[phab:T239170|T239170]]
* 15:25 _joe_: restarting lvs2006 for addition of eventgate-logging-external,blubberoid-https
* 15:24 moritzm: installing freetype bugfix updates from Buster 10.2 point release
* 15:21 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=eventgate-logging-external
* 15:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 moritzm: downgrading trapperkeeper-webserver-jetty9-clojure packages on puppetdb hosts to the version shipped in Buster 10.2
* 15:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 ema: cp-ats: rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki> restart to enable lua reload [[phab:T233274|T233274]]
* 15:02 moritzm: remove trapperkeeper-webserver-jetty9-clojure debs from apt.wikimedia.org/buster-wikimedia (these were needed to unbreak TLS on Puppetdb in Buster, but an update landed in Buster 10.2, which replaces our custom hotfix)
* 14:56 marostegui: Add new grants for nova_cell0 database on m5 - [[phab:T239170|T239170]]
* 14:50 marostegui: Create nova_cell0 database on m5 master - [[phab:T239170|T239170]]
* 14:43 effie: reimage mw1346, mw1336, mw1326
* 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 effie: reimage mw2285, mw2284, mw2283
* 14:14 effie: reimage mw2285, mw2286, mw2283
* 14:01 moritzm: temporarily stop cas on idp1001 for some failover tests
* 14:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of testwikidatawiki to read from the new term store for items ([[phab:T225057|T225057]]) (duration: 00m 56s)
* 13:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:42 ema: cp1075: repool with tslua reloads enabled [[phab:T233274|T233274]]
* 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 13:28 ema: cp1075: ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki> restarted to apply tslua reload changes [[phab:T233274|T233274]]
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P9766 and previous config saved to /var/cache/conftool/dbconfig/20191127-132359-marostegui.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9765 and previous config saved to /var/cache/conftool/dbconfig/20191127-132220-marostegui.json
* 13:21 effie: reimage mw2288, mw2287, mw2286
* 13:13 effie: reimage  mw1348, mw1338,  mw1328
* 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
* 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
* 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
* 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=apache2,cluster=api_appserver,name=mw2289.codfw.wmnet
* 12:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=nginx
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=nginx
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=nginx
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=apache2
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=apache2
* 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=apache2
* 12:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 12:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 12:18 apergos: reimaged dumpsdata1001 to buster and forgot to use the dang script but it is all ok anyhow :-P
* 11:47 Amir1: deployed security patch for [[phab:T237667|T237667]]
* 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=nginx
* 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=nginx
* 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=apache2
* 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=apache2
* 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=nginx
* 11:27 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=apache2
* 11:21 effie: reimage mw2289.codfw.wmnet
* 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:06 ema: cp1075: depool to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552955/ and test tslua reloads [[phab:T233274|T233274]]
* 11:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:04 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:43 effie: reimage mw1347,mw1337,mw1327 - [[phab:T239054|T239054]]
* 10:32 ariel@deploy1001: Finished deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files (duration: 00m 03s)
* 10:32 ariel@deploy1001: Started deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files
* 09:41 moritzm: installing symfony security updates
* 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 09:29 moritzm: installing php-imagick security updates
* 09:25 ema: cp3050: re-enable request coalescing after performance experiment [[phab:T238494|T238494]]
* 09:02 effie: reimage mw1317.eqiad.wmnet - [[phab:T239054|T239054]]
* 09:01 marostegui: Stop replication on 1124:3318 to reimport wikidatawiki.page table on labsdb1010 - [[phab:T238399|T238399]]
* 08:24 godog: silence codfw varnish traffic drop until dec 9th - [[phab:T239039|T239039]]
* 08:09 godog: swift eqiad-prod: more weight to ms-be105[7-9] - [[phab:T237438|T237438]]
* 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 07:53 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 07:51 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 07:49 elukey: roll restart of eventstreams on scb2* - [[phab:T239220|T239220]]
* 07:41 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 07:15 vgutierrez: repooling cp3063 - [[phab:T239310|T239310]]
* 07:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3063.esams.wmnet
* 07:04 vgutierrez: depool & powercycle cp3063 - [[phab:T239310|T239310]]
* 07:03 marostegui: Compress tables on db1102:3314
* 06:52 marostegui: Remove db2062 from tendril and zarcillo - [[phab:T238726|T238726]]
* 06:50 marostegui: Stop MySQL on db2062 - [[phab:T238726|T238726]]
* 06:25 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 06:05 marostegui: Promote db2135 to codfw m5 master [[phab:T238183|T238183]]
* 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2135 to the config [[phab:T238183|T238183]] (duration: 00m 59s)
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2135 to the config [[phab:T238183|T238183]] (duration: 01m 11s)
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2125 [[phab:T239042|T239042]]', diff saved to https://phabricator.wikimedia.org/P9759 and previous config saved to /var/cache/conftool/dbconfig/20191127-054809-marostegui.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9758 and previous config saved to /var/cache/conftool/dbconfig/20191127-054056-marostegui.json
* 01:58 krinkle@deploy1001: Synchronized vendor: {{Gerrit|4108ff4e2}} (3/3) (duration: 01m 00s)
* 01:56 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|4108ff4e2}} (2/3) (duration: 00m 59s)
* 01:55 krinkle@deploy1001: Synchronized lib/: {{Gerrit|4108ff4e2}} (1/3) (duration: 01m 01s)
* 01:28 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 03s)
* 00:05 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Show UploadWizard CTA on testcommonswiki ([[phab:T234960|T234960]]) (duration: 01m 00s)
== 2019-11-26 ==
* 23:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WelcomeSurvey for 100% of new users on arwiki (duration: 01m 02s)
* 23:25 eileen: process-control config revision is {{Gerrit|ad80b0136c}}
* 20:33 jforrester@deploy1001: Synchronized dblists/: Update dblists, now autogenerated (no-op, just comment changes) [[phab:T223602|T223602]] (duration: 01m 01s)
* 20:25 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c282e86]: Followup on [[phab:T230495|T230495]] (duration: 00m 59s)
* 20:24 ebernhardson@deploy1001: Finished deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3 (duration: 00m 42s)
* 20:24 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c282e86]: Followup on [[phab:T230495|T230495]]
* 20:24 ebernhardson@deploy1001: Started deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3
* 20:06 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs [[phab:T230495|T230495]] (duration: 01m 23s)
* 20:05 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs [[phab:T230495|T230495]]
* 19:59 Pchelolo: create partitioned topics for cirrusSearchElasticaWrite on kafka-main [[phab:T239135|T239135]]
* 19:57 Urbanecm: Reset email of TheklanBot ([[phab:T239233|T239233]])
* 19:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.8
* 19:39 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache (duration: 32m 52s)
* 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9753 and previous config saved to /var/cache/conftool/dbconfig/20191126-192724-marostegui.json
* 19:22 shdubsh: restore codfw logstash to baseline - [[phab:T215904|T215904]]
* 19:09 shdubsh: stop logstash codfw, generate some consumer lag, and set batch size to 2000 - [[phab:T215904|T215904]]
* 19:07 ebernhardson@deploy1001: Finished deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml (duration: 00m 29s)
* 19:07 ebernhardson@deploy1001: Started deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml
* 19:06 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache
* 19:04 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.2 (duration: 07m 08s)
* 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 05s)
* 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
* 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 02s)
* 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
* 18:55 shdubsh: stop logstash codfw, generate some consumer lag - [[phab:T215904|T215904]]
* 18:44 shdubsh: temporarily update pipeline.batch.size to 1000 on logstash2004 - [[phab:T215904|T215904]]
* 18:33 shdubsh: stop logstash on logstash200[5-6] for metrics collection - [[phab:T215904|T215904]]
* 18:09 brennen: issues with branch.py branch cut; deleted stub wmf/1.35.0-wmf.8 branch and proceeding with standard process
* 17:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Show UploadWizard CTA in beta ([[phab:T234960|T234960]]) (duration: 00m 52s)
* 17:31 brennen: cutting branch for 1.35.0-wmf.8
* 17:26 paravoid: moving fiberring from cr3-esams:xe-0/0/2 to cr2-esams:xe-0/1/8
* 17:25 ppchelko@deploy1001: Finished deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP [[phab:T229015|T229015]] (duration: 15m 38s)
* 17:10 ppchelko@deploy1001: Started deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP [[phab:T229015|T229015]]
* 17:03 paravoid: above was for cr3-esams
* 17:03 paravoid: cr2-esams: disable interface xe-0/0/2 (transit)
* 16:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop Scribunto special-case for HHVM, never reached [[phab:T235142|T235142]] (duration: 00m 52s)
* 16:32 jforrester@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: Drop HHVMRequestInit symlink creation (duration: 00m 52s)
* 16:31 James_F: No sane way to delete HHVMRequestInit.php with a simple sync-dir, so waiting for the full scap.
* 16:30 jforrester@deploy1001: Synchronized docroot/noc/conf/: Drop HHVMRequestInit symlink (duration: 00m 52s)
* 16:27 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Update Parsoid to {{Gerrit|7b9b424a}} (duration: 08m 37s)
* 16:19 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Update Parsoid to {{Gerrit|7b9b424a}}
* 16:10 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Testing rollback fixes ([[phab:T238685|T238685]]) (duration: 01m 07s)
* 16:09 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Testing rollback fixes ([[phab:T238685|T238685]])
* 16:01 ema: cp3050: temporarily disable request coalescing to assess performance impact [[phab:T238494|T238494]]
* 15:15 ema: cp3050: repool after failed test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ (reverted) [[phab:T238494|T238494]]
* 14:55 bblack: ignore previous message, restarts not necessary
* 14:53 bblack: rolling through authdns daemon restarts (necessary to reconfigure ANY-address listener) on authdns1001, authdns2001, ganeti3003
* 14:44 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Raise memory limit on parsoid servers 2/2 (duration: 00m 52s)
* 14:42 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Raise memory limit on parsoid servers 1/2 (duration: 00m 51s)
* 14:30 oblivian@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 14:05 ema: cp3050: depool to merge and test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ [[phab:T238494|T238494]]
* 13:11 effie: enable puppet on mediawiki servers
* 13:03 effie: Remove tmpreaper package from all mediawiki servers - [[phab:T229792|T229792]]
* 12:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:552498{{!}}Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp (T238918)]] (duration: 00m 53s)
* 12:07 XioNoX: power down mr1-esams for replacement - [[phab:T238174|T238174]]
* 11:36 elukey: reboot stat1007
* 11:35 marostegui: Deploy schema change on db1139:3311
* 11:35 effie: enable puppet on mw canary servers, and restart apaches
* 10:50 hashar: Updated jenkins job operations-puppet-tests-stretch-docker to use latest Docker container
* 10:30 godog: swift eqiad-prod: add ms-be105[7-9] - [[phab:T237438|T237438]]
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9749 and previous config saved to /var/cache/conftool/dbconfig/20191126-102442-marostegui.json
* 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:07 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 effie: Disable puppet on all mediawiki servers to test 489982
* 09:26 marostegui: Deploy schema change on s8 primary master (db1109) - [[phab:T234066|T234066]] [[phab:T233135|T233135]] [[phab:T237120|T237120]]
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into s8 vslow,dump', diff saved to https://phabricator.wikimedia.org/P9748 and previous config saved to /var/cache/conftool/dbconfig/20191126-092409-marostegui.json
* 09:18 marostegui: Run maintain-views for wikidatawiki.protected_title view on labsdb hosts [[phab:T233135|T233135]]
* 07:53 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch Flow to Parsoid/PHP on mw.org -- [[phab:T229015|T229015]] (duration: 00m 52s)
* 07:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions [[phab:T234266|T234266]] (duration: 14m 24s)
* 07:29 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions [[phab:T234266|T234266]]
* 07:28 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions (duration: 07m 36s)
* 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1061 from config - [[phab:T238624|T238624]]', diff saved to https://phabricator.wikimedia.org/P9745 and previous config saved to /var/cache/conftool/dbconfig/20191126-071746-marostegui.json
* 07:09 marostegui: Stop MySQL on db1061 - [[phab:T238624|T238624]]
* 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1061 from config [[phab:T238624|T238624]] (duration: 00m 52s)
* 07:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1061 from config [[phab:T238624|T238624]] (duration: 00m 54s)
* 06:51 marostegui: Run compare.py for db2125 - [[phab:T239042|T239042]]
* 06:44 marostegui: Remove triggers for ar_comment on db1124:3318 [[phab:T234704|T234704]]
* 06:43 marostegui: Deploy schema change on db1087 with replication, lag will be generated on s8 for labsdb hosts
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, and pool db1092 temporarily as vslow,dump for s8, for a schema change on db1087', diff saved to https://phabricator.wikimedia.org/P9744 and previous config saved to /var/cache/conftool/dbconfig/20191126-064200-marostegui.json
* 06:34 XioNoX: Rename cr2-knams to cr3-knams - [[phab:T237030|T237030]]
* 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1086 on s7 master and remove read-only from s7 [[phab:T238044|T238044]]', diff saved to https://phabricator.wikimedia.org/P9743 and previous config saved to /var/cache/conftool/dbconfig/20191126-060108-marostegui.json
* 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance [[phab:T238044|T238044]]', diff saved to https://phabricator.wikimedia.org/P9742 and previous config saved to /var/cache/conftool/dbconfig/20191126-060023-marostegui.json
* 06:00 marostegui: Starting s7 failover from db1062 to db1086 - [[phab:T238044|T238044]]
* 05:49 marostegui: Deploy schema change on dbstore1003:3311
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1086 as it will be the new s7 master - [[phab:T238044|T238044]]', diff saved to https://phabricator.wikimedia.org/P9741 and previous config saved to /var/cache/conftool/dbconfig/20191126-051034-marostegui.json
* 05:08 marostegui: Start pre-steps for s7 failover - [[phab:T238044|T238044]]
== 2019-11-25 ==
* 23:39 cstone: payments-wiki revision changed from {{Gerrit|e4d51fe247}} to {{Gerrit|2eb54fd6ef}}
* 23:14 Urbanecm: Evening SWAT done
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
* 23:10 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 01s)
* 23:09 urbanecm@deploy1001: Synchronized dblists/: SWAT: {{Gerrit|aed2369}}: Add gewikimedia to special.dblist ([[phab:T239173|T239173]]) (duration: 00m 52s)
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|d71b0ab}}: kask-echoseen: Do not report dupes ([[phab:T237143|T237143]]) (duration: 00m 53s)
* 22:13 Jeff_Green: authdns update to deploy {{Gerrit|I21ddc1a3e}}
* 22:04 eileen: civicrm revision changed from {{Gerrit|852c4a36bd}} to {{Gerrit|5cf2d2713f}}, config revision is {{Gerrit|c4ad2f5990}}
* 20:37 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1298.eqiad.wmnet
* 20:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
* 20:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 20:07 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 20:05 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 20:04 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
* 19:35 mutante: mw1298 - scap pull
* 19:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 19:30 ema@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet,service=nginx
* 19:14 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
* 19:13 cdanis: restarted grafana-server on grafana1002 [[phab:T220838|T220838]]
* 19:11 cdanis: copied snapshot of database from grafana1001 to grafana1002 [[phab:T220838|T220838]]
* 19:07 cdanis: stopping grafana-next.wikimedia.org (on grafana1002)
* 19:06 cdanis: making grafana.wikimedia.org read-only (on grafana1001) ✔️ cdanis@grafana1001.eqiad.wmnet ~ 🕑☕ sudo chmod -w /var/lib/grafana/grafana.db
* 18:56 Lucas_WMDE: Morning SWAT done
* 18:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/TemplateData/: SWAT: [[gerrit:552871{{!}}Implement ParsoidFetchTemplateData hook for Parsoid/PHP (T238954)]] (duration: 00m 53s)
* 18:54 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
* 18:54 ema: cumin -b1 'A:cp-ats and A:esams' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
* 18:53 ema: cumin -b1 'A:cp-ats and A:eqsin' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
* 18:53 ema: cumin -b1 'A:cp-ats and A:ulsfo' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
* 18:52 ema: cumin -b1 'A:cp-ats and A:codfw' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
* 18:51 ema: cumin -b1 'A:cp-ats and A:eqiad' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
* 18:50 bblack: cp[245]*: wipe daemon.log and restart syslog, again
* 18:48 mutante: mw1298 - pooling
* 18:26 bblack: cp[245]*: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
* 18:17 bblack: cp4028: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
* 18:16 effie: Restart php-fpm on mw* and wtp* servers in eqiad and codfw - [[phab:T236963|T236963]]
* 18:07 effie: Upgrade php-wikidiff2 to 1.10.0 to all servers - [[phab:T236963|T236963]]
* 17:55 gehel: restart wdqs-updater on all wdqs servers
* 17:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates (duration: 10m 24s)
* 17:50 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch private wiki clients (Flow, VE) to Parsoid/PHP -- [[phab:T229015|T229015]] (duration: 00m 53s)
* 17:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates
* 17:36 marostegui: Upgrade kernel on db2125 [[phab:T239042|T239042]]
* 17:25 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates (duration: 12m 23s)
* 17:19 XioNoX: power down cr2-knams - [[phab:T237030|T237030]]
* 17:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@e7faa19]: Updating Parsoid to {{Gerrit|a6bfdfa}} (duration: 08m 58s)
* 17:12 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates
* 17:05 arlolra@deploy1001: Started deploy [parsoid/deploy@e7faa19]: Updating Parsoid to {{Gerrit|a6bfdfa}}
* 16:48 jynus: upgrading and restarting dbprov* hosts
* 15:49 ema: pool cp3064 with varnish-be [[phab:T227432|T227432]]
* 15:36 ema: cp3064 create filesystem on /dev/nvme0n1p1 (see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552547/) and reboot [[phab:T238494|T238494]]
* 15:22 ema: cp3064 manual reboot after wmf-auto-reimage error: 'Unable to run wmf-auto-reimage-host: Failed to reboot_host' [[phab:T238494|T238494]]
* 15:20 ema: cp-ats: rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki> restart to enable lua reload [[phab:T233274|T233274]]
* 15:18 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 15:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 15:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:11 ema: cp1075: ats-tls-restart to enable lua reload [[phab:T233274|T233274]]
* 15:10 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 15:09 ema@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 ema: cp1075: ats-backend-restart to enable lua reload [[phab:T233274|T233274]]
* 15:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
* 15:00 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp3056.esams.wmnet,service=ats-be
* 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 14:50 XioNoX: enable cr3-esams:et-1/0/0 - [[phab:T236767|T236767]]
* 14:45 ema: depool cp3064 and reimage with varnish-be [[phab:T227432|T227432]]
* 14:44 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 14:38 marostegui: Remove triggers from archive table on s1 codfw sanitarium [[phab:T234704|T234704]]
* 14:37 marostegui: Deploy schema change on s1 codfw (this will generate lag on codfw) - [[phab:T234066|T234066]] [[phab:T233135|T233135]]
* 14:23 moritzm: upgrading OpenJDK 11 on an-conf*
* 14:04 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 13:27 elukey: set global read_only=1 on db1108's log database - [[phab:T159170|T159170]]
* 13:16 XioNoX: cleanup config on cr3-esams - [[phab:T237031|T237031]]
* 13:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 13:11 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 13:06 XioNoX: cleanup config on cr2-esams - [[phab:T237031|T237031]]
* 13:02 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 12:59 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 12:48 XioNoX: bundle esams-knams links on knams side - [[phab:T237031|T237031]]
* 12:42 XioNoX: bundle esams-knams links on esams side - [[phab:T237031|T237031]]
* 12:27 XioNoX: disable BGP to knams transits - [[phab:T237031|T237031]]
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Increase main traffic weight for db1126', diff saved to https://phabricator.wikimedia.org/P9735 and previous config saved to /var/cache/conftool/dbconfig/20191125-114821-marostegui.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P9734 and previous config saved to /var/cache/conftool/dbconfig/20191125-114733-marostegui.json
* 11:40 effie: cumin -b 2 -s 10 restart php on API servers
* 11:31 effie: restart php-fpm on mw1314
* 11:16 Urbanecm: EU SWAT done
* 11:16 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/AbuseFilter/extension.json: SWAT: {{Gerrit|29a16bd}}: Restrict viewing Special:Log/AbuseFilter, and remove from recent changes ([[phab:T34959|T34959]]) (duration: 01m 04s)
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|4670d1d}}: Add  throttle rule for WMCL Editathon 2019-12-07 ([[phab:T238986|T238986]]) (duration: 00m 53s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|9394f1f}}: Allow enwikiversity interface admins to remove their own interface administratorship ([[phab:T238967|T238967]]) (duration: 00m 57s)
* 09:45 moritzm: installing cron updates from buster point release
* 09:32 moritzm: installing systemd security/bugfix updates on buster
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - schema change', diff saved to https://phabricator.wikimedia.org/P9732 and previous config saved to /var/cache/conftool/dbconfig/20191125-093157-marostegui.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P9731 and previous config saved to /var/cache/conftool/dbconfig/20191125-093038-marostegui.json
* 09:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: [[phab:T238822|T238822]] (duration: 13m 08s)
* 09:28 _joe_: building and publishing updated images for envoy
* 09:17 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: [[phab:T238822|T238822]]
* 09:13 moritzm: installing python2.7 updates on buster
* 08:53 _joe_: rebuilding base docker images docker-registry.wikimedia.org/wikimedia-<nowiki>{</nowiki>jessie,stretch,buster<nowiki>}</nowiki>
* 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:22 marostegui: Compress db2090
* 07:04 marostegui: Upgrade db2134
* 06:24 marostegui: Compress db2080
* 06:23 marostegui: Compress db2082
* 06:22 marostegui: Compress db2094:3318
* 06:18 marostegui: racadm serveraction hardreset on db2125 [[phab:T239042|T239042]]
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 - schema change', diff saved to https://phabricator.wikimedia.org/P9730 and previous config saved to /var/cache/conftool/dbconfig/20191125-061629-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9729 and previous config saved to /var/cache/conftool/dbconfig/20191125-061542-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9728 and previous config saved to /var/cache/conftool/dbconfig/20191125-060728-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9727 and previous config saved to /var/cache/conftool/dbconfig/20191125-060011-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed [[phab:T239042|T239042]]', diff saved to https://phabricator.wikimedia.org/P9726 and previous config saved to /var/cache/conftool/dbconfig/20191125-055813-marostegui.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9725 and previous config saved to /var/cache/conftool/dbconfig/20191125-055305-marostegui.json
* 03:13 vgutierrez: repooling cp3053 - [[phab:T239041|T239041]]
* 03:00 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3053.esams.wmnet
* 02:59 vgutierrez: depooling & power-cycling cp3053 - [[phab:T239041|T239041]]
* 00:10 eileen: also speed the repair  process-control config revision is {{Gerrit|c4ad2f5990}}
== 2019-11-24 ==
* 20:54 eileen: process-control config revision is {{Gerrit|371782a667}}
* 15:41 ariel@deploy1001: Finished deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps (duration: 00m 03s)
* 15:41 ariel@deploy1001: Started deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps
* 15:01 apergos: rebooting dumpsdata1002 to clear up the other half of the nfs issues
* 14:24 apergos: rebooting snapshot1008 to clear up some nfs + kernel issues
== 2019-11-23 ==
* 18:19 gehel: repool wdqs1007, catched up on lag - [[phab:T238229|T238229]]
* 14:23 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 55s)
* 11:56 _joe_: oblivian@cumin1001:~$ sudo cumin -b2 -s60 A:mw-eqiad 'restart-php7.2-fpm'
* 11:47 _joe_: restarting php7.2-fpm on mw1329
* 09:49 XioNoX: downtime all ripe-atlas checks until Monday (most likely an upstream issue/maintenance)
== 2019-11-22 ==
* 21:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238955|T238955]] (duration: 00m 53s)
* 18:02 shdubsh: restore prometheus services default settings - [[phab:T238807|T238807]]
* 17:52 _joe_: repooling restbase2018
* 17:36 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:34 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 shdubsh: clean tombstones on prometheus1004 - [[phab:T238807|T238807]]
* 17:09 shdubsh: restart prometheus on prometheus1004 - [[phab:T238807|T238807]]
* 16:22 shdubsh: clean tombstones on prometheus1003 - [[phab:T238807|T238807]]
* 15:40 XioNoX: renumber AS17639 sessions in eqsin
* 15:16 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/: Stop outputting anything in case of 304 responses in Special:EntityData ([[phab:T238901|T238901]]) (duration: 00m 57s)
* 14:49 _joe_: disabling puppet on restbase2018, testing envoy upgrade [[phab:T238050|T238050]]
* 14:48 _joe_: uploaded envoyproxy 1.12.1 to <nowiki>{</nowiki>buster,stretch<nowiki>}</nowiki> [[phab:T237235|T237235]]
* 13:11 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T238119|T238119]] [[phab:T238524|T238524]] [[phab:T237375|T237375]] [[phab:T238120|T238120]])
* 13:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: [[phab:T238473|T238473]] (duration: 00m 52s)
* 12:34 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)
* 12:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)
* 11:59 effie: reload php7 on canaries
* 11:34 effie: Roll out wikidiff2 1.10.0-1 to canaries - [[phab:T236963|T236963]]
* 11:29 effie: upload wikidiff2 1.10.0-1 - [[phab:T236963|T236963]]
* 09:59 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 10s)
* 09:56 ladsgroup@deploy1001: Synchronized langlist: [[phab:T238105|T238105]] (duration: 00m 51s)
* 09:47 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
* 09:44 ladsgroup@deploy1001: Synchronized langlist: [[phab:T238104|T238104]] [[phab:T238104|T238104]] (duration: 00m 52s)
* 09:28 ema: pool cp1081 with ATS backend [[phab:T227432|T227432]]
* 09:27 gehel: depool wdqs1007 to allow to catch up on lag - [[phab:T238229|T238229]]
* 09:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/includes/specials/pagers/ContribsPager.php: Remove live hack of limit for [[phab:T234450|T234450]] (duration: 00m 54s)
* 09:19 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T234450|T234450]] (duration: 00m 55s)
* 09:07 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 ema@cumin1001: START - Cookbook sre.hosts.downtime
* 09:04 gehel: remove blazegraph 2.1.5-wmf.11 from archiva, broken upload
* 08:54 gehel: restarting blazegraph and updater on wdqs1007
* 08:54 gehel: restarting blazegraph and updater on edqs1007
* 08:49 ema: depool cp1081 and reimage as text_ats [[phab:T227432|T227432]]
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Rebalance weights on s7 in preparation for s7 failover on Tuesday [[phab:T238044|T238044]]', diff saved to https://phabricator.wikimedia.org/P9722 and previous config saved to /var/cache/conftool/dbconfig/20191122-063145-marostegui.json
* 03:49 shdubsh: restart prometheus@ops on prometheus1003 [[phab:T238807|T238807]]
* 00:46 mutante: xhgui1001/xhgui2001 - rsyncing /srv/mongod from tungsten to /srv/tungsten/mongod/ on both new machines ([[phab:T158837|T158837]])
* 00:37 mutante: tungsten - starting ferm service
* 00:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move newcomer tasks JSON config from mw.org to local wikis ([[phab:T237301|T237301]]) (duration: 00m 52s)
* 00:18 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Make non-remote titles work in RemotePageConfigurationLoader ([[phab:T237301|T237301]]) (duration: 00m 54s)
== 2019-11-21 ==
* 23:09 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused CirrusSearch config variable (duration: 00m 52s)
* 22:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --overwrite --user=Bürgerentscheid . ([[phab:T238764|T238764]])
* 21:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Revert "Add Machine Vision CTA to final step ([[phab:T234960|T234960]])", take 2 (duration: 00m 41s)
* 21:36 mholloway-shell@deploy1001: Scap failed!: 5/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 21:34 mholloway-shell@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 21:29 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Add Machine Vision CTA to final step ([[phab:T234960|T234960]]) (duration: 00m 59s)
* 21:16 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@70154b4]: Update mobileapps to {{Gerrit|c140e88}} (duration: 06m 29s)
* 21:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@70154b4]: Update mobileapps to {{Gerrit|c140e88}}
* 20:51 mutante: puppetmaster1001 - revoking puppet certs for xhgui1001/xhgui2001
* 20:49 mutante: ganeti1003 - switching boot order of xhgui1001 to network and reinstalling with stretch ([[phab:T238098|T238098]])
* 20:16 mforns@deploy1001: Finished deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist (duration: 08m 29s)
* 20:14 mutante: icinga1001 - systemctl reset-failed
* 20:08 mforns@deploy1001: Started deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist
* 19:01 andrewbogott: upgrading designate to 'ocata' on cloudservices1003 and 1004
* 18:49 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 18:45 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 18:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 18:13 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis back to Parsoid/JS - [[phab:T229015|T229015]] (duration: 00m 52s)
* 18:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 18:02 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Use HTTPS for contacting Parsoid/PHP - [[phab:T229015|T229015]] (duration: 00m 53s)
* 17:52 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Switch private wikis to Parsoid/PHP; file 4/4 -- [[phab:T229015|T229015]] (duration: 00m 53s)
* 17:51 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis to Parsoid/PHP; file 3/4 -- [[phab:T229015|T229015]] (duration: 00m 51s)
* 17:50 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch private wikis to Parsoid/PHP; file 2/4 -- [[phab:T229015|T229015]] (duration: 00m 53s)
* 17:48 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: Switch private wikis to Parsoid/PHP; file 1/4 -- [[phab:T229015|T229015]] (duration: 00m 53s)
* 17:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 16m 43s)
* 17:10 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - [[phab:T229015|T229015]]
* 17:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP (duration: 02m 38s)
* 17:06 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP
* 16:54 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 16:48 sbassett@deploy1001: Finished scap: Deploying [[phab:T238451|T238451]] (ext:AbuseFilter), running scap sync for i18n issues. (duration: 16m 42s)
* 16:31 sbassett@deploy1001: Started scap: Deploying [[phab:T238451|T238451]] (ext:AbuseFilter), running scap sync for i18n issues.
* 15:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
* 15:42 mforns@deploy1001: Finished deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107) (duration: 10m 50s)
* 15:31 mforns@deploy1001: Started deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107)
* 15:30 ema: pool cp1079 with ATS backend [[phab:T227432|T227432]]
* 15:22 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 15:19 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 15:13 akosiaris: purge https://releases.wikimedia.org/charts/eventgate-0.0.13.tgz, https://releases.wikimedia.org/charts/ and https://releases.wikimedia.org/charts/index.yaml
* 15:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 bblack: DONE testing deployment software changes on authdns cluster, back to normal
* 15:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 ema: depool cp1079 and reimage as text_ats [[phab:T227432|T227432]]
* 14:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: Agent filter changes (duration: 18m 33s)
* 14:43 bblack: testing deployment software changes on authdns cluster, please hold dns changes for a few!
* 14:41 thcipriani: restarting Jenkins for update
* 14:28 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: Agent filter changes
* 13:59 ema: pool cp1077 with ATS backend [[phab:T227432|T227432]]
* 13:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:39 ema@cumin1001: START - Cookbook sre.hosts.downtime
* 13:20 ema: depool cp1077 and reimage as text_ats [[phab:T227432|T227432]]
* 11:53 reedy@deploy1001: Finished scap: [[phab:T234450|T234450]] (duration: 19m 20s)
* 11:42 effie: enable puppet on all mw hosts
* 11:33 reedy@deploy1001: Started scap: [[phab:T234450|T234450]]
* 11:09 Urbanecm: EU SWAT done
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e4861ec}}: Set correct language for shywiktionary ([[phab:T238105|T238105]]) (duration: 00m 52s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|68d2003}}: Restrict editing CNBanner namespace to autoconfirmed on metawiki ([[phab:T238723|T238723]]) (duration: 00m 54s)
* 11:05 effie: disable puppet on mw[1-2]*
* 10:49 volans: restarting tcpircbot-logmsgbot on icinga1001, has failed to log some messages, no useful log on the host
* 10:22 ema: pool cp2023 with Varnish backend [[phab:T238817|T238817]] [[phab:T227432|T227432]]
* 10:18 arturo: update buster-wikimedia thirdparty/kubeadm-k8s packages (newer version will be used to handle [[phab:T238654|T238654]])
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331<nowiki>{</nowiki>2,7<nowiki>}</nowiki> after upgrade', diff saved to https://phabricator.wikimedia.org/P9714 and previous config saved to /var/cache/conftool/dbconfig/20191121-095401-marostegui.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331<nowiki>{</nowiki>2,7<nowiki>}</nowiki> after upgrade', diff saved to https://phabricator.wikimedia.org/P9713 and previous config saved to /var/cache/conftool/dbconfig/20191121-093958-marostegui.json
* 09:39 ema: depool cp2023 and reimage back as varnish-be [[phab:T238817|T238817]] [[phab:T227432|T227432]]
* 09:38 marostegui: Stop MySQL on db1067 - [[phab:T238297|T238297]]
* 09:27 marostegui: Upgrade db1090:3312, db1090:3317
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P9712 and previous config saved to /var/cache/conftool/dbconfig/20191121-092554-marostegui.json
* 09:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9711 and previous config saved to /var/cache/conftool/dbconfig/20191121-090623-marostegui.json
* 09:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 08:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9710 and previous config saved to /var/cache/conftool/dbconfig/20191121-085644-marostegui.json
* 08:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9709 and previous config saved to /var/cache/conftool/dbconfig/20191121-084500-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9708 and previous config saved to /var/cache/conftool/dbconfig/20191121-083322-marostegui.json
* 08:21 marostegui: Upgrade db1079
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for upgrade', diff saved to https://phabricator.wikimedia.org/P9707 and previous config saved to /var/cache/conftool/dbconfig/20191121-082108-marostegui.json
* 07:57 akosiaris: upgrade OTRS to 5.0.39 [[phab:T225925|T225925]]
* 07:56 marostegui: Promote db2133 to codfw m2 master - [[phab:T238183|T238183]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9706 and previous config saved to /var/cache/conftool/dbconfig/20191121-072543-marostegui.json
* 07:18 marostegui: Upgrade db1125 (sanitarium)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9705 and previous config saved to /var/cache/conftool/dbconfig/20191121-071758-marostegui.json
* 06:56 marostegui: Repool labsdb1009
* 06:32 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db1124:3313 [[phab:T238115|T238115]] [[phab:T238114|T238114]] [[phab:T237373|T237373]] [[phab:T238522|T238522]] [[phab:T236404|T236404]]
* 06:30 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db2094:3313 [[phab:T238115|T238115]] [[phab:T238114|T238114]] [[phab:T237373|T237373]] [[phab:T238522|T238522]] [[phab:T236404|T236404]]
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9704 and previous config saved to /var/cache/conftool/dbconfig/20191121-062412-marostegui.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9703 and previous config saved to /var/cache/conftool/dbconfig/20191121-061711-marostegui.json
* 06:16 marostegui: Compress db2081
* 06:13 marostegui: Stop MySQL on db1107 [[phab:T238113|T238113]]
* 06:06 marostegui: Compress db2083
* 05:57 marostegui: Depool labsdb1009 for upgrade
* 05:56 marostegui: Upgrade db1086
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for upgrade', diff saved to https://phabricator.wikimedia.org/P9702 and previous config saved to /var/cache/conftool/dbconfig/20191121-055557-marostegui.json
* 05:53 marostegui: Compress db2073
* 00:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config does not seem to be applying on half the app servers, resyncing (duration: 00m 52s)
* 00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable suggested edits without opt-in ([[phab:T227728|T227728]]) (duration: 00m 52s)
* 00:18 catrope@deploy1001: Finished scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n) (duration: 15m 57s)
* 00:02 catrope@deploy1001: Started scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n)
== 2019-11-20 ==
* 23:14 Amir1: finished creating five wikis, total duration 134 minutes
* 23:14 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
* 23:11 ladsgroup@deploy1001: Synchronized langlist: [[phab:T238105|T238105]] (duration: 00m 50s)
* 23:10 ladsgroup@deploy1001: Synchronized static/images/project-logos/: [[phab:T238105|T238105]] (duration: 00m 52s)
* 23:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238105|T238105]] (duration: 00m 51s)
* 23:08 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: [[phab:T238105|T238105]] (duration: 00m 51s)
* 23:05 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T238105|T238105]]
* 22:59 ladsgroup@deploy1001: Synchronized dblists: [[phab:T238105|T238105]] (duration: 00m 53s)
* 22:49 ladsgroup@deploy1001: Synchronized langlist: [[phab:T238104|T238104]] (duration: 00m 51s)
* 22:48 ladsgroup@deploy1001: Synchronized static/images/project-logos/: [[phab:T238104|T238104]] (duration: 00m 52s)
* 22:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238104|T238104]] (duration: 00m 52s)
* 22:43 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: [[phab:T238104|T238104]] (duration: 00m 51s)
* 22:41 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T238104|T238104]]
* 22:36 ladsgroup@deploy1001: Synchronized dblists: [[phab:T238104|T238104]] (duration: 00m 52s)
* 22:22 ladsgroup@deploy1001: Synchronized langlist: [[phab:T237369|T237369]] (duration: 00m 53s)
* 22:21 ladsgroup@deploy1001: Synchronized static/images/project-logos/: [[phab:T237369|T237369]] (duration: 00m 52s)
* 22:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T237369|T237369]] (duration: 00m 51s)
* 22:17 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: [[phab:T237369|T237369]] (duration: 00m 51s)
* 22:15 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T237369|T237369]]
* 22:11 ladsgroup@deploy1001: Synchronized dblists: [[phab:T237369|T237369]] (duration: 00m 52s)
* 22:00 Urbanecm: Wiki creation continues
* 21:56 ladsgroup@deploy1001: Synchronized langlist: [[phab:T236861|T236861]] (duration: 00m 52s)
* 21:55 ladsgroup@deploy1001: Synchronized static/images/project-logos/: [[phab:T236861|T236861]] (duration: 00m 51s)
* 21:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T236861|T236861]] (duration: 00m 52s)
* 21:52 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: [[phab:T236861|T236861]] (duration: 00m 51s)
* 21:49 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T236861|T236861]]
* 21:44 ladsgroup@deploy1001: Synchronized dblists: [[phab:T236861|T236861]] (duration: 00m 52s)
* 21:38 Urbanecm: mwscript createAndPromote.php --wiki=gewikimedia --sysop --bureaucrat Mehman97 <password redacted> ([[phab:T236389|T236389]])
* 21:35 gehel: repool wdqs1004 - [[phab:T238229|T238229]]
* 21:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: new wiki gewikimedia ([[phab:T236389|T236389]]) (duration: 00m 52s)
* 21:29 urbanecm@deploy1001: Synchronized static/images/project-logos/: new wiki gewikimedia ([[phab:T236389|T236389]]) (duration: 00m 53s)
* 21:28 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: new wiki gewikimedia ([[phab:T236389|T236389]]) (duration: 00m 52s)
* 21:27 ejegg: Fundraising CiviCRM updated from {{Gerrit|2802bdd649}} to {{Gerrit|852c4a36bd}}
* 21:23 mutante: notebook1003 - systemctl start nagios-nrpe-server (second time today already today [[phab:T212824|T212824]])
* 21:20 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: new wiki gewikimedia ([[phab:T236389|T236389]])
* 21:16 urbanecm@deploy1001: Synchronized dblists: new wiki gewikimedia ([[phab:T236389|T236389]]) (duration: 00m 52s)
* 21:01 ssastry@deploy1001: Finished deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test [[phab:T238748|T238748]] fix (duration: 07m 20s)
* 20:53 ssastry@deploy1001: Started deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test [[phab:T238748|T238748]] fix
* 20:37 ssastry@deploy1001: Finished deploy [parsoid/deploy@d5646b7]: Updating Parsoid to {{Gerrit|2e79460d}} (duration: 09m 14s)
* 20:27 ssastry@deploy1001: Started deploy [parsoid/deploy@d5646b7]: Updating Parsoid to {{Gerrit|2e79460d}}
* 20:27 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 20:23 mutante: notebook1003 - sudo systemctl nagios-nrpe-server (as usual ....)
* 20:19 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 19:31 ejegg: updated fundraising internal dashboard from {{Gerrit|69fdbec60d}} to {{Gerrit|8fc2726736}}
* 19:04 mutante: xhgui1001 - initial puppet run, signed puppet cert on puppetmaster1001
* 18:56 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 50s)
* 18:51 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 54s)
* 18:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 170 (duration: 00m 53s)
* 18:31 mutante: ganeti - introducing and installing buster on new VMs xhgui1001/xhgui2001 - for replacing tungsten (jessie) [[phab:T238098|T238098]]
* 18:17 mobrovac: morning SWAT done
* 18:17 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.5/includes/libs/virtualrest/ParsoidVirtualRESTService.php: Parsoid VRS: Add the Host header - [[phab:T229015|T229015]] [[phab:T229078|T229078]] [[phab:T229074|T229074]] (duration: 00m 52s)
* 18:13 shdubsh: restart mtail on fermium
* 17:40 ema: pool cp2023 with ATS backend [[phab:T227432|T227432]]
* 17:24 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:21 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:19 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 17:18 andrewbogott: upgrading pdns to version 4 on cloudservices1003
* 17:06 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:04 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 17:03 andrewbogott: upgrading pdns to version 4 on cloudvirt1004 [[phab:T210715|T210715]]
* 16:58 andrewbogott: disabling puppet on cloudvirt1003 and 1004 for [[phab:T210715|T210715]]
* 16:55 moritzm: installing rpcbind bugfix updates from buster 10.2 point release
* 16:43 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 16:23 ema: depool cp2023 and reimage as text_ats [[phab:T227432|T227432]]
* 16:14 ema: pool cp2019 with ATS backend [[phab:T227432|T227432]]
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9695 and previous config saved to /var/cache/conftool/dbconfig/20191120-160813-marostegui.json
* 16:03 gehel: depool wdqs1004 to allow catching up on lag - [[phab:T238229|T238229]]
* 15:42 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: [BETA-ONLY] Switch Flow to use Parsoid/PHP - [[phab:T229078|T229078]] (duration: 00m 52s)
* 15:40 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:38 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 15:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 180 [[gerrit:552069]] (duration: 00m 52s)
* 15:19 ema: depool cp2019 and reimage as text_ats [[phab:T227432|T227432]]
* 15:08 gehel: reset LVS weight for wdqs public eqiad to 10
* 15:05 effie: Enable puppet on mw*
* 14:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221774|T221774]] - wgWikidataOrgQueryServiceMaxLagFactor 180 [[gerrit:552069]] (duration: 00m 52s)
* 14:50 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: [[phab:T221774|T221774]] - Wikidata.org extension (use altered lag, not raw lag) [[gerrit:552072]] (duration: 00m 53s)
* 14:49 ema: pool cp2016 with ATS backend [[phab:T227432|T227432]]
* 14:47 effie: disable puppet on all mw* servers
* 14:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 14:06 ema: depool cp2016 and reimage as text_ats [[phab:T227432|T227432]]
* 13:32 godog: updated puppet compiler facts on compiler100* hosts
* 12:43 ema: pool cp2013 with ATS backend [[phab:T227432|T227432]]
* 12:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:25 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 12:08 ema: depool cp2013 and reimage as text_ats [[phab:T227432|T227432]]
* 11:59 ema: pool cp2012 with ATS backend [[phab:T227432|T227432]]
* 11:55 Urbanecm: EU SWAT done
* 11:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|2b13fbe}}: [rowiki] Enable deleterevision for patrollers ([[phab:T234051|T234051]]) (duration: 00m 52s)
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|51ecd71}}: Partial cleanup of InitializeSettings ([[phab:T231178|T231178]]) (duration: 00m 52s)
* 11:42 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 11:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|f847380}}: Set namespace alias for Index: (NS 102/103) for elwikisource ([[phab:T237253|T237253]]) (duration: 00m 54s)
* 11:36 urbanecm@deploy1001: Finished scap: SWAT: {{Gerrit|44ec4e4}}: {{Gerrit|e1baf0e}}:  {{Gerrit|3c02aa7}}: Namespace changes (duration: 06m 15s)
* 11:30 urbanecm@deploy1001: Started scap: SWAT: {{Gerrit|44ec4e4}}: {{Gerrit|e1baf0e}}:  {{Gerrit|3c02aa7}}: Namespace changes
* 11:27 ema: cp2010: ats-backend-restart to clear backend restart alert
* 11:21 ema: depool cp2012 and reimage as text_ats [[phab:T227432|T227432]]
* 11:15 ema: pool cp2010 with ATS backend [[phab:T227432|T227432]]
* 10:54 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:52 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 10:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - [[phab:T238716|T238716]] (duration: 13m 56s)
* 10:34 ema: depool cp2010 and reimage as text_ats [[phab:T227432|T227432]]
* 10:30 marostegui: Upgrade db1116
* 10:22 mobrovac@deploy1001: Started deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - [[phab:T238716|T238716]]
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P9694 and previous config saved to /var/cache/conftool/dbconfig/20191120-101727-marostegui.json
* 10:14 marostegui: Compress db2095:3314
* 10:07 mobrovac@deploy1001: Finished deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - [[phab:T238716|T238716]] (duration: 14m 54s)
* 09:56 marostegui: Compress db2106
* 09:52 mobrovac@deploy1001: Started deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - [[phab:T238716|T238716]]
* 09:48 marostegui: Compress dbstore1005:3318
* 09:47 marostegui: Compress dbstore1004:3314
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9693 and previous config saved to /var/cache/conftool/dbconfig/20191120-093308-marostegui.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9692 and previous config saved to /var/cache/conftool/dbconfig/20191120-092337-marostegui.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9691 and previous config saved to /var/cache/conftool/dbconfig/20191120-090739-marostegui.json
* 08:55 marostegui: Upgrade db1094
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for upgrade', diff saved to https://phabricator.wikimedia.org/P9690 and previous config saved to /var/cache/conftool/dbconfig/20191120-085448-marostegui.json
* 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:43 marostegui: Promote db2132 as m1-codfw master - [[phab:T238183|T238183]]
* 07:19 marostegui: Upgrade db2062
* 07:19 marostegui: Upgrade db2078
* 07:14 marostegui: Deploy schema change on s3 (testwikidatawiki) directly on s3 primary master [[phab:T237120|T237120]]
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P9688 and previous config saved to /var/cache/conftool/dbconfig/20191120-070511-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1136', diff saved to https://phabricator.wikimedia.org/P9687 and previous config saved to /var/cache/conftool/dbconfig/20191120-065718-marostegui.json
* 06:44 marostegui: Upgrade db2118 (s7 codfw master)
* 06:41 marostegui: Repool labsdb1011
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1136 into s7 api', diff saved to https://phabricator.wikimedia.org/P9686 and previous config saved to /var/cache/conftool/dbconfig/20191120-064022-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136 after upgrade', diff saved to https://phabricator.wikimedia.org/P9685 and previous config saved to /var/cache/conftool/dbconfig/20191120-063628-marostegui.json
* 06:28 marostegui: Upgrade db1136
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for upgrade', diff saved to https://phabricator.wikimedia.org/P9684 and previous config saved to /var/cache/conftool/dbconfig/20191120-062749-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after upgrade', diff saved to https://phabricator.wikimedia.org/P9683 and previous config saved to /var/cache/conftool/dbconfig/20191120-062029-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9682 and previous config saved to /var/cache/conftool/dbconfig/20191120-061938-marostegui.json
* 05:58 marostegui: Stop MySQL on db1101:3317, db1101:3318 for upgrade and schema change
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for upgrade and schema change', diff saved to https://phabricator.wikimedia.org/P9681 and previous config saved to /var/cache/conftool/dbconfig/20191120-055732-marostegui.json
* 05:55 marostegui: Depool labsdb1011 for upgrade
* 05:54 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1105:3311 db1097:3314 db1098:3316 db1098:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9680 and previous config saved to /var/cache/conftool/dbconfig/20191120-055426-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P9679 and previous config saved to /var/cache/conftool/dbconfig/20191120-054840-marostegui.json
* 03:16 tgr: [[phab:T208369|T208369]] ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php kowiki --cutoff 350
* 02:57 vgutierrez: restarting pybal on lvs2002
* 02:54 vgutierrez: restarting pybal on lvs2005
* 02:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 02:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 00:10 mutante: phab2001 - restart ssh-phab service after repooling it after buster reinstall, it wasn't listening on the IPv6 IP,causing LVS/pybal alerts
* 00:06 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Pass token as editing_session_id for suggested edits ([[phab:T238249|T238249]]) (duration: 00m 53s)
* 00:02 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: EditAttemptStep: Allow overriding session ID ([[phab:T238249|T238249]]) (duration: 00m 52s)
* 00:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikiEditor/: EditAttemptStep: Allow overriding session ID ([[phab:T238249|T238249]]) (duration: 00m 54s)
== 2019-11-19 ==
== 2019-11-19 ==
* 23:58 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MobileFrontend/: EditAttemptStep: Allow overriding session ID ([[phab:T238249|T238249]]) (duration: 00m 53s)
* 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikimediaEvents/: EditAttemptStep: Allow other extensions to trigger oversampling ([[phab:T238249|T238249]]) (duration: 00m 53s)
* 23:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 21:45 XioNoX: rebooting pfw3-codfw:node1 for upgrade - [[phab:T235150|T235150]]
* 21:14 XioNoX: rebooting pfw3-codfw for upgrade - [[phab:T235150|T235150]]
* 20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:17 gehel: completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - [[phab:T212826|T212826]]
* 20:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:10 XioNoX: homer push on mgmt routers
* 20:09 mutante: phab1003 after merging gerrit:551910 puppet now also stopped the actual aphlict service and removed the systemd unit file. had to manually run 'systemctl reset-failed' though to clean systemd status and avoid icinga alert ([[phab:T238593|T238593]])
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:18 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 19:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop ([[phab:T229286|T229286]]) (duration: 06m 49s)
* 19:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop ([[phab:T229286|T229286]])
* 19:00 elukey: regenerate TLS cert for yarn.wikimedia.org (containing SANs for all analytics UIs) to add datasets.w.o SAN (site was failing due to ATS not being able to contact thorium)
* 18:59 rlazarus: restarted php7.2-fpm on wtp2001, wtp2002
* 18:56 rlazarus: restarted php7.2-fpm on wtp1025, wtp1026
* 18:35 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: Unbreak instrumentation of init events (duration: 00m 53s)
* 18:34 ssastry@deploy1001: Finished deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to {{Gerrit|1a1105a7}} (duration: 02m 04s)
* 18:32 ssastry@deploy1001: Started deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to {{Gerrit|1a1105a7}}
* 18:30 mutante: icinga config - manually added team-dcops, started icinga
* 18:20 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: [[phab:T221774|T221774]] - Wikidata.org extension (queryservice maxlag, hook) [[gerrit:551858]] (duration: 00m 53s)
* 18:12 RoanKattouw: That was eowiktionary, not eowikisource
* 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure default search namespaces for eowikisource ([[phab:T237792|T237792]]) (duration: 00m 52s)
* 17:43 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: [[phab:T221774|T221774]] - Wikidata.org extension (queryservice maxlag, maint script) [[gerrit:551857]] (duration: 00m 52s)
* 17:39 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:11 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: [[phab:T221774|T221774]] - Wikidata.org extension (queryservice maxlag) [[gerrit:551855]] [[gerrit:551856]] (duration: 00m 54s)
* 17:02 volker-e@deploy1001: Finished deploy [design/style-guide@d73818a]: Deploy design/style-guide:  (duration: 00m 07s)
* 17:02 volker-e@deploy1001: Started deploy [design/style-guide@d73818a]: Deploy design/style-guide:
* 16:58 ema: pool cp2007 with ATS backend [[phab:T227432|T227432]]
* 16:30 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 16:25 moritzm: installing glib2.0 security updates
* 16:21 mutante: phab1003 - puppet restarts aphlict service even with "phabricator_aphlict_enabled: false" in Hiera. But it does properly remove the proxy config lines from apache. so service is running but not used. ([[phab:T238593|T238593]])
* 16:17 mutante: phab1003 - systemctl stop aphlict (proxy config in apache is disabled as well as disabled in ATS) ([[phab:T238593|T238593]])
* 16:15 gehel: reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - [[phab:T212826|T212826]]
* 16:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:10 ema: depool cp2007 and reimage as text_ats [[phab:T227432|T227432]]
* 16:09 ema: pool cp2006 with ATS backend [[phab:T227432|T227432]]
* 15:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure (duration: 02m 11s)
* 15:57 mobrovac@deploy1001: Started deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure
* 15:37 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:34 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 15:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - [[phab:T229015|T229015]] (duration: 14m 22s)
* 15:15 ema: depool cp2006 and reimage as text_ats [[phab:T227432|T227432]]
* 15:13 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - [[phab:T229015|T229015]]
* 15:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP (duration: 02m 58s)
* 15:07 ema: pool cp2004 with ATS backend [[phab:T227432|T227432]]
* 15:06 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP
* 14:38 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:36 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 14:34 gehel: restarting blazegraph with additional logging on wdqs1004 - [[phab:T231411|T231411]]
* 14:18 ema: depool cp2004 and reimage as text_ats [[phab:T227432|T227432]]
* 14:13 ema: pool cp2001 with ATS backend [[phab:T227432|T227432]]
* 13:57 marostegui: Deploy schema change on metawiki directly on s7 master [[phab:T238370|T238370]]
* 13:57 marostegui: Deploy schema change on mediawikiwiki directly on s7 master [[phab:T238370|T238370]]
* 13:55 marostegui: Deploy schema change on mediawikiwiki directly on s3 master [[phab:T238370|T238370]]
* 13:50 marostegui: Deploy schema change on foundationwiki directly on s3 master - [[phab:T238370|T238370]]
* 13:46 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T238370|T238370]]
* 13:39 marostegui: Deploy schema change on db1092
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P9673 and previous config saved to /var/cache/conftool/dbconfig/20191119-133850-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9672 and previous config saved to /var/cache/conftool/dbconfig/20191119-133704-marostegui.json
* 13:34 ema@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:33 ema@cumin2001: START - Cookbook sre.hosts.downtime
* 13:14 ema: depool cp2001 and reimage as text_ats [[phab:T227432|T227432]]
* 12:42 jbond42: add libapache2-mod-auth-cas 1.2-1 to stretch-wikimedia repo
* 12:28 effie: enable puppet on P:mediawiki::php and *.eqiad.wmnet
* 12:22 effie: enable puppet on P:mediawiki::php and *.codfw.wmnet
* 12:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1067 from config [[phab:T238297|T238297]] (duration: 00m 52s)
* 12:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1067 from config [[phab:T238297|T238297]] (duration: 00m 52s)
* 11:41 gehel: depooling wdqs1004 - [[phab:T231411|T231411]]
* 11:37 gehel: restarting wdqs blazegraph on wdqs1004 - [[phab:T231411|T231411]]
* 11:29 marostegui: Upgrade dbstore1003 (3311,3315,3317)
* 11:16 gehel: restarting wdqs updater on wdqs1004 - [[phab:T231411|T231411]]
* 10:36 marostegui: Compress and upgrade db1098:3316
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9671 and previous config saved to /var/cache/conftool/dbconfig/20191119-103540-marostegui.json
* 10:34 marostegui: Compress and upgrade db1098:3317
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9670 and previous config saved to /var/cache/conftool/dbconfig/20191119-103426-marostegui.json
* 10:29 marostegui: Upgrade db2077
* 10:24 marostegui: Upgrade db2120 db2121 db2122
* 10:10 marostegui: Upgrade MySQL on db2086 db2087 db2100
* 10:06 godog: repool centrallog2001
* 09:40 effie: disable puppet on P:mediawiki::php - [[phab:T229792|T229792]]
* 09:21 moritzm: installing ncurses security updates
* 09:20 moritzm: rolling restart of nginx on acmechief/puppetdb to pick up libxslt security updates
* 09:08 moritzm: installing libxslt security updates
* 09:08 marostegui: Deploy schema change on db1101:3318
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9669 and previous config saved to /var/cache/conftool/dbconfig/20191119-090823-marostegui.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9668 and previous config saved to /var/cache/conftool/dbconfig/20191119-090745-marostegui.json
* 09:05 marostegui: Repool labsbdb1010
* 07:33 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Enable math links in Beta - [[phab:T208758|T208758]] (duration: 00m 53s)
* 06:45 marostegui: Stop MySQL on db2061 [[phab:T238526|T238526]]
* 06:44 marostegui: Remove db2061 from tendril and zarcillo [[phab:T238526|T238526]]
* 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2061 from config [[phab:T238526|T238526]] (duration: 00m 52s)
* 06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2061 from config [[phab:T238526|T238526]] (duration: 00m 53s)
* 06:26 vgutierrez: Move cp1089 from nginx to ats-tls - [[phab:T231627|T231627]]
* 06:20 marostegui: Depool labsdb1010 for upgrade
* 06:02 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1131 to s6 master and remove read-only from s6 [[phab:T235469|T235469]]', diff saved to https://phabricator.wikimedia.org/P9667 and previous config saved to /var/cache/conftool/dbconfig/20191119-060203-marostegui.json
* 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance [[phab:T235469|T235469]]', diff saved to https://phabricator.wikimedia.org/P9666 and previous config saved to /var/cache/conftool/dbconfig/20191119-060122-marostegui.json
* 06:01 marostegui: Starting s6 failover from db1061 to db1131 - [[phab:T235469|T235469]]
* 05:37 eileen: process control - I reverted the above to check some stuff first
* 05:36 vgutierrez: Move cp1087 from nginx to ats-tls - [[phab:T231627|T231627]]
* 05:26 marostegui: Deploy schema change on db1099:3318
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9665 and previous config saved to /var/cache/conftool/dbconfig/20191119-052632-marostegui.json
* 05:25 marostegui: Compress db1097:3314
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9664 and previous config saved to /var/cache/conftool/dbconfig/20191119-052412-marostegui.json
* 05:17 vgutierrez: Move cp1085 from nginx to ats-tls - [[phab:T231627|T231627]]
* 05:14 marostegui: Compress tables on db1105:3311
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9663 and previous config saved to /var/cache/conftool/dbconfig/20191119-051344-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after compression', diff saved to https://phabricator.wikimedia.org/P9662 and previous config saved to /var/cache/conftool/dbconfig/20191119-051259-marostegui.json
* 05:12 eileen: process-control config revision is {{Gerrit|9fbfc79988}} - change gap on repair job to 16 hours to reflect the with-daylight-savings ones
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T235469|T235469]] ', diff saved to https://phabricator.wikimedia.org/P9661 and previous config saved to /var/cache/conftool/dbconfig/20191119-050748-marostegui.json
* 05:02 marostegui: Start pre-switchover steps [[phab:T235469|T235469]]
* 04:47 vgutierrez: Move cp2023 from nginx to ats-tls - [[phab:T231627|T231627]]
* 04:17 vgutierrez: Move cp2019 from nginx to ats-tls - [[phab:T231627|T231627]]
* 03:53 vgutierrez: Move cp2016 from nginx to ats-tls - [[phab:T231627|T231627]]
* 03:51 tgr: [[phab:T208369|T208369]] ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php cswiki --cutoff 350
* 03:37 vgutierrez: Move cp2013 from nginx to ats-tls - [[phab:T231627|T231627]]
* 01:12 ejegg: re-enabled fundraising CiviCRM contact de-duplication jobs
* 01:05 ejegg: disabled fundraising CiviCRM contact de-duplication jobs
* 00:54 ejegg: updated civicrm from {{Gerrit|1f454aa69a}} to {{Gerrit|2802bdd649}}
* 00:54 ejegg: updated civicrm from {{Gerrit|1f454aa69a}} to {{Gerrit|2802bdd649}}
* 00:39 mutante: phab2001 - rsyncing /srv/repos data from phab1003 ([[phab:T190568|T190568]])
* 00:39 mutante: phab2001 - rsyncing /srv/repos data from phab1003 ([[phab:T190568|T190568]])

Revision as of 01:00, 3 December 2019

2019-12-03

  • 01:00 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.8/extensions/Translate/sql/translate_{…}.sql T239091
  • 00:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Enable Translate extension on sewikimedia (duration: 00m 57s)
  • 00:54 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.5/extensions/Wikibase/client/sql/entity_usage.sql
  • 00:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Echo/includes/DiscussionParser.php: T239275 Fix type hint fatal from getUserLinks() (duration: 01m 16s)
  • 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-12-02

  • 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
  • 23:05 mutante: mw2248 - restart nginx (for some reason unit was running but not listening on 443 after reimage..now it does)
  • 23:05 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:02 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:46 ejegg: updated payments-wiki from 06a8c3cdff to f61c9f0692
  • 22:44 bblack: reimaging dns4002 to buster - T239667
  • 22:07 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Update text for no personal uploads message (T238873) (duration: 01m 03s)
  • 22:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
  • 21:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
  • 21:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
  • 21:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:22 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P9796 and previous config saved to /var/cache/conftool/dbconfig/20191202-205904-marostegui.json
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=nginx,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=apache2,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=nginx,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=apache2,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=nginx,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=apache2,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=nginx,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=apache2,dc=codfw
  • 20:36 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch Flow on all wikis to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 20:35 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015 (duration: 14m 59s)
  • 20:12 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2) (duration: 00m 05s)
  • 20:12 joal@deploy1001: Started deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2)
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2) (duration: 08m 08s)
  • 20:06 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015
  • 20:05 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labslabslabs (duration: 01m 08s)
  • 20:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP (duration: 02m 48s)
  • 20:02 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:59 joal@deploy1001: Started deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2)
  • 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:56 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:51 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:50 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:50 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015 (duration: 13m 48s)
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:23 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015
  • 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP (duration: 06m 38s)
  • 19:16 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP
  • 19:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015 (duration: 14m 11s)
  • 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 mobrovac@deploy1001: Started deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015
  • 18:39 joal@deploy1001: Finished deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy (duration: 00m 06s)
  • 18:39 joal@deploy1001: Started deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy
  • 18:38 joal@deploy1001: Finished deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy (duration: 08m 21s)
  • 18:32 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:30 joal@deploy1001: Started deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy
  • 18:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes (duration: 15m 42s)
  • 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:00 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes
  • 17:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015 (duration: 14m 06s)
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:42 mobrovac@deploy1001: Started deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015
  • 17:29 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495 (duration: 01m 14s)
  • 17:28 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495
  • 17:21 ssastry@deploy1001: Finished deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy (duration: 07m 48s)
  • 17:14 ssastry@deploy1001: Started deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy
  • 17:09 ejegg: disabled fundraising job omnimail_groupmember_load
  • 16:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:43 ejegg: updated fundraising internal dashboard from 8fc2726736 to 3a93d2aba4
  • 16:43 effie: restart all API cluster in eqiad
  • 16:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 hashar: Restarted CI Jenkins
  • 16:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015 (duration: 13m 53s)
  • 16:41 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=global T238494
  • 16:32 ema: cp3053: repooling after firmware update T239041
  • 16:27 mobrovac@deploy1001: Started deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015
  • 16:19 effie: reimage mw1295.eqiad.wmnet mw1294.eqiad.wmnet mw1293.eqiad.wmnet
  • 16:11 robh: cp3053 depooling and rebooting for firmware update T239041
  • 16:10 robh: cp3035 depooling and rebooting for firmware update T239041
  • 15:38 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:38 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid VRS: Switch groups 0 and 1 to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 15:35 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607 (duration: 14m 51s)
  • 15:26 effie: Rolling restart mw1345-1348
  • 15:15 mobrovac@deploy1001: Started deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607
  • 14:46 ema: cp-ats: set server_session_sharing.match=2 everywhere (puppet re-enable and run) T238494
  • 14:31 ema: cp-ats: merge server_session_sharing.match=2 (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553490/) with puppet disabled, test on cp3050 T238494
  • 14:18 godog: set grafana theme back to light, was dark for some reason
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P9794 and previous config saved to /var/cache/conftool/dbconfig/20191202-135643-marostegui.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P9793 and previous config saved to /var/cache/conftool/dbconfig/20191202-135543-marostegui.json
  • 13:47 ema: power-cycle cp3053 T239041
  • 13:44 hashar: Restarted CI Jenkins
  • 13:30 hashar: Restarted CI Jenkins
  • 13:14 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015 (duration: 14m 49s)
  • 13:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015
  • 12:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes (duration: 02m 54s)
  • 12:54 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes
  • 12:54 Urbanecm: EU SWAT done
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d27fe78: Enable partial blocks on eswiki (T239370) (duration: 01m 00s)
  • 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 445bdc3: Remove `move-rootuserpages` from user on svwiki (T238842) (duration: 01m 04s)
  • 12:43 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki*.png
  • 12:39 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 61a9563: Revert "Change bawiki logo to an anniversary one" (T237070) (duration: 01m 06s)
  • 12:37 effie: reimage mw1296.eqiad.wmnet
  • 12:37 effie: reimage mw1298.eqiad.wmnet
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set read new for term store for items of wikidata up to Q1000 (T225057) (duration: 01m 00s)
  • 12:19 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/GrowthExperiments/: SWAT: Suggested edits: do not treat AQS lookup failure as error (T238178) (duration: 01m 02s)
  • 11:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:50 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
  • 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 moritzm: installing ruby2.1 security updates
  • 10:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 moritzm: installing python-psutil security updates
  • 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:42 effie: reimage mw1299.eqiad.wmnet
  • 10:18 effie: reimage mw1290.eqiad.wmnet
  • 10:18 effie: reimage mw1275.eqiad.wmnet
  • 10:15 moritzm: installing file/libmagic regresssion update for jessie
  • 10:08 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
  • 09:52 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:41 joal@deploy1001: Finished deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin) (duration: 00m 08s)
  • 09:41 joal@deploy1001: Started deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin)
  • 09:40 joal@deploy1001: Finished deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week (duration: 18m 22s)
  • 09:23 effie: reimage mw1300.eqiad.wmnet
  • 09:23 effie: reimage mw1300.eqiad.wmne
  • 09:22 joal@deploy1001: Started deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week
  • 09:16 moritzm: installing libvpx security updates
  • 09:14 godog: extend graphite LVs on graphite1004 / graphite2003 by 200G
  • 08:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 effie: reimage mw1287.eqiad.wmnet mw1288.eqiad.wmnet mw1289.eqiad.wmnet
  • 08:08 effie: reimage mw1301.eqiad.wmnet
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 andrewbogott: forcing a reboot of cloudstore1008 via mgmt console — it seems to have locked up
  • 06:43 Urbanecm: Clear account creation throttle for several IPs (T239465)
  • 06:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for cawiki workshop (T239465) (duration: 01m 03s)
  • 06:00 marostegui: Compress s8 codfw master (lag might appear on codfw s8)
  • 06:00 marostegui: Compress s4 codfw master (lag might appear on codfw s4)
  • 05:56 marostegui: Deploy schema change on db1075
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P9791 and previous config saved to /var/cache/conftool/dbconfig/20191202-055546-marostegui.json
  • 05:53 marostegui: Compress db1099:3318 T235599
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for compression', diff saved to https://phabricator.wikimedia.org/P9790 and previous config saved to /var/cache/conftool/dbconfig/20191202-055245-marostegui.json

2019-12-01

  • 23:27 ladsgroup@deploy1001: Started restart [mobileapps/deploy@70154b4]: Rolling restart of mobileapps
  • 23:20 bblack: restarting AQS services in eqiad
  • 23:15 eileen: process-control config revision is 9750c318a0 - jobs disabled
  • 21:39 andrewbogott: restarted nova conductor and api on cloudcontrol1003 and 1004 to free up db connections (T239168)

2019-11-30

  • 15:47 Urbanecm: Reset email of SUL user Hayk.arabaget (T239462)
  • 07:40 vgutierrez: repooling cp3057 - T239502
  • 07:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 07:30 vgutierrez: depool and powercycle cp3057 - T239502

2019-11-29

  • 22:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:12 effie: reimage mw1302.eqiad.wmnet
  • 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:19 effie: reimage mw1284.eqiad.wmnet
  • 19:19 effie: reimage mw1303.eqiad.wmnet mw1283.eqiad.wmnet
  • 17:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
  • 16:17 effie: reimage mw1274.eqiad.wmnet
  • 16:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 effie: reimage mw1282.eqiad.wmnet
  • 14:45 effie: reimage mw1282.eqiad.wmne
  • 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 effie: reimage mw1323.eqiad.wmnet mw1297.eqiad.wmnet mw1273.eqiad.wmnet
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 filippo@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
  • 14:13 godog: reimage mw2228 for partman tests
  • 14:02 effie: reimage mw1271.eqiad.wmnet mw1272.eqiad.wmnet mw1304.eqiad.wmnet
  • 13:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 jynus: reenable puppet on dbprov2001, backup1001
  • 13:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 jynus: disabling puppet also on on backup1001 to test recoveries
  • 12:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 effie: reimage mw1305.eqiad.wmnet mw1265.eqiad.wmnet mw1270.eqiad.wmnet
  • 11:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:39 jynus: disabling puppet on dbprov2001 to test recoveries
  • 11:34 effie: reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmnet
  • 11:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 Lucas_WMDE: <effie> 10:58:17 log reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmne
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 elukey@deploy1001: Finished deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts (duration: 00m 08s)
  • 10:47 elukey@deploy1001: Started deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts
  • 10:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 effie: reimage mw1306.eqiad.wmnet mw1264.eqiad.wmnet mw1279.eqiad.wmnet
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui: Remove triggers from db2094:3313 - T234704
  • 09:33 marostegui: Stop replication on db2105 (s3 codfw) for schema change
  • 09:23 effie: reimage mw1263.eqiad.wmnet mw1307.eqiad.wmnet
  • 09:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 volans: temporary disabling puppet on 'R:keyholder::agent' to merge gerrit:operations/puppet/+/553460 - T239386
  • 09:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 effie: reimage mw2223.codfw.wmnet mw2222.codfw.wmnet mw2221.codfw.wmnet mw2220.codfw.wmnet
  • 07:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:25 effie: reimage mw1312.eqiad.wmnet mw1308.eqiad.wmnet mw1261.eqiad.wmnet
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P9781 and previous config saved to /var/cache/conftool/dbconfig/20191129-055845-marostegui.json
  • 05:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.5/includes/exception/MWExceptionHandler.php: 532f4aba96d85 (duration: 01m 03s)

2019-11-28

  • 23:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:21 effie: reimage mw1329.eqiad.wmnet
  • 23:01 effie: restart cp1087
  • 22:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:19 effie: reimage mw1309.eqiad.wmnet
  • 21:19 effie: reimage mw1323.eqiad.wmnet
  • 21:11 effie: reimage mw1316.eqiad.wmnet mw1315.eqiad.wmnet
  • 20:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:03 effie: reimage mw1313.eqiad.wmnet
  • 20:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 effie: reimage mw1331.eqiad.wmnet mw1330.eqiad.wmnet mw1310.eqiad.wmnet
  • 18:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:41 marostegui: Deploy schema change on db1134
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P9780 and previous config saved to /var/cache/conftool/dbconfig/20191128-183918-marostegui.json
  • 18:29 effie: reimage w1319.eqiad.wmnet mw1318.eqiad.wmnet
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P9779 and previous config saved to /var/cache/conftool/dbconfig/20191128-180517-marostegui.json
  • 17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 effie: reimage mw1340.eqiad.wmnet mw1339.eqiad.wmnet
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:32 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:18 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:58 effie: reimage mw1311.eqiad.wmnet
  • 15:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 effie: reimage mw1333.eqiad.wmnet mw1332.eqiad.wmnet mw1331.eqiad.wmnet
  • 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 effie: reimage mw1343.eqiad.wmnet mw1342.eqiad.wmnet mw1341.eqiad.wmnet
  • 14:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 marostegui: Deploy schema change on s3 codfw on the master, lag will appear on s3 codfw (T234066)
  • 13:57 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 5 (T237984)
  • 13:57 marostegui: Deploy schema change on s4 codfw master with replication - T234066
  • 13:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:37 marostegui: Deploy schema change on db1106 with replication (lag will appear on s1 on labs) - T234066 T233135
  • 13:37 marostegui: Recreate views for enwiki_p.protected_titles for all labsdb hosts - T233135
  • 13:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:33 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:31 marostegui: Remove ar_comment triggers from db1124:3311 for enwiki.archive - T234704
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change, temporarily pool db1080 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9778 and previous config saved to /var/cache/conftool/dbconfig/20191128-133013-marostegui.json
  • 13:28 volans: cleanup root's crontab entries on netmon hosts from netbox/postres stuff - T238919
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P9777 and previous config saved to /var/cache/conftool/dbconfig/20191128-132647-marostegui.json
  • 13:21 volans: cumin 'netmon*' 'rm -v /var/spool/cron/crontabs/postgres' T238919
  • 13:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 effie: enable puppet on thumbor*
  • 13:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:51 effie: disable puppet on thumbor*
  • 12:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 effie: reimage mw1267.eqiad.wmnet mw1277.eqiad.wmnet
  • 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 effie: reimage mw1344.eqiad.wmnet mw1334.eqiad.wmnet mw1324.eqiad.wmnet
  • 11:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 effie: reimage mw2279 mw2278 mw2277 mw2276 mw2275
  • 10:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 marostegui: Compress labsdb1009
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:17 effie: reimage mw1266, mw1276
  • 09:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 marostegui: Compress labsdb1011
  • 08:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 marostegui: Remove m4 from tendril and zarcillo - T159170
  • 08:15 effie: reimage mw2280, mw2281, mw2282
  • 08:06 marostegui: Compress labsdb1012
  • 07:56 effie: reimage mw1345, mw1335, mw1325
  • 06:56 elukey: remove log files on an-tool1007 to free root partition space
  • 06:14 marostegui: Remove db1061 from tendril and zarcillo - T238624
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:02 marostegui: Remove db2067 from tendril and zarcillo T233185
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P9776 and previous config saved to /var/cache/conftool/dbconfig/20191128-055212-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P9775 and previous config saved to /var/cache/conftool/dbconfig/20191128-055025-marostegui.json
  • 03:03 vgutierrez: restarting keyholder on acmechief[12]001
  • 01:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:59 mutante: mw2244 restart php-fpm and apache which somehow are returning 5xx after reimage
  • 00:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-11-27

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mutante: mw2215 scap pull
  • 21:30 mutante: mw2215 rebooting
  • 21:10 bblack: restarting acme-chief service on acmechief1001 (daemon appears to be stuck on a lock and nonfunctional for days...)
  • 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:14 cstone: payments-wiki revision changed from 2eb54fd6ef to 06a8c3cdff
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P9773 and previous config saved to /var/cache/conftool/dbconfig/20191127-193528-marostegui.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P9772 and previous config saved to /var/cache/conftool/dbconfig/20191127-193227-marostegui.json
  • 19:32 ebernhardson@deploy1001: Finished deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient (duration: 00m 45s)
  • 19:31 ebernhardson@deploy1001: Started deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient
  • 19:27 mutante: an-airflow1001 - apt-get install python3-mysqldb - start airflow-webserver
  • 19:24 ebernhardson@deploy1001: Finished deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package (duration: 00m 42s)
  • 19:23 ebernhardson@deploy1001: Started deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package
  • 19:08 ebernhardson@deploy1001: Finished deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance (duration: 00m 40s)
  • 19:08 ebernhardson@deploy1001: Started deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance
  • 19:00 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg (it tries to write this on first start and did not have permissions to do so) T236180
  • 18:58 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg
  • 18:57 eileen: process-control config revision is b95355c0c0 - repair omnirecipient job off
  • 16:57 andrewbogott: disabling puppet on clouvirt* and cloudcontrol* while merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552894/
  • 16:50 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external
  • 16:32 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: dd4c76d3d SpecialContributions: max concurrency 3 (instead of 10) T234450 (duration: 01m 17s)
  • 16:22 ejegg: shifted daily silverpop export start time one hour earlier
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P9768 and previous config saved to /var/cache/conftool/dbconfig/20191127-161525-marostegui.json
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P9767 and previous config saved to /var/cache/conftool/dbconfig/20191127-161450-marostegui.json
  • 16:06 ema: cp3050: set proxy.config.http.server_session_sharing.match to "ip" T238494
  • 15:57 _joe_: restarting pybal on lvs1015
  • 15:56 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:55 _joe_: restarting pybal on lvs1016
  • 15:52 jynus: disabling puppet on dbprov1001 to test bacula restore T238048
  • 15:47 papaul: testing redundancy power on scs-a1-codfw
  • 15:47 _joe_: restarting pybal on lvs2003
  • 15:44 _joe_: restarting pybal again on lvs2006
  • 15:42 jynus: migrate db entries of archive Media to backup1001 T238048
  • 15:37 marostegui: Logging retroactively for the record: drop user 'nova'@'%' from m5 - T239170
  • 15:30 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 marostegui: Add grants for dump (10.192.0.114,10.192.16.96) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:27 marostegui: Add grants for dump (10.64.0.95,10.64.16.31) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:25 _joe_: restarting lvs2006 for addition of eventgate-logging-external,blubberoid-https
  • 15:24 moritzm: installing freetype bugfix updates from Buster 10.2 point release
  • 15:21 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=eventgate-logging-external
  • 15:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 moritzm: downgrading trapperkeeper-webserver-jetty9-clojure packages on puppetdb hosts to the version shipped in Buster 10.2
  • 15:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:02 moritzm: remove trapperkeeper-webserver-jetty9-clojure debs from apt.wikimedia.org/buster-wikimedia (these were needed to unbreak TLS on Puppetdb in Buster, but an update landed in Buster 10.2, which replaces our custom hotfix)
  • 14:56 marostegui: Add new grants for nova_cell0 database on m5 - T239170
  • 14:50 marostegui: Create nova_cell0 database on m5 master - T239170
  • 14:43 effie: reimage mw1346, mw1336, mw1326
  • 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:15 effie: reimage mw2285, mw2284, mw2283
  • 14:14 effie: reimage mw2285, mw2286, mw2283
  • 14:01 moritzm: temporarily stop cas on idp1001 for some failover tests
  • 14:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of testwikidatawiki to read from the new term store for items (T225057) (duration: 00m 56s)
  • 13:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:42 ema: cp1075: repool with tslua reloads enabled T233274
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:28 ema: cp1075: ats-{tls,backend} restarted to apply tslua reload changes T233274
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P9766 and previous config saved to /var/cache/conftool/dbconfig/20191127-132359-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9765 and previous config saved to /var/cache/conftool/dbconfig/20191127-132220-marostegui.json
  • 13:21 effie: reimage mw2288, mw2287, mw2286
  • 13:13 effie: reimage mw1348, mw1338, mw1328
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=apache2,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 12:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:18 apergos: reimaged dumpsdata1001 to buster and forgot to use the dang script but it is all ok anyhow :-P
  • 11:47 Amir1: deployed security patch for T237667
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 11:27 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 11:21 effie: reimage mw2289.codfw.wmnet
  • 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:06 ema: cp1075: depool to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552955/ and test tslua reloads T233274
  • 11:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:04 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 effie: reimage mw1347,mw1337,mw1327 - T239054
  • 10:32 ariel@deploy1001: Finished deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files (duration: 00m 03s)
  • 10:32 ariel@deploy1001: Started deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files
  • 09:41 moritzm: installing symfony security updates
  • 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 moritzm: installing php-imagick security updates
  • 09:25 ema: cp3050: re-enable request coalescing after performance experiment T238494
  • 09:02 effie: reimage mw1317.eqiad.wmnet - T239054
  • 09:01 marostegui: Stop replication on 1124:3318 to reimport wikidatawiki.page table on labsdb1010 - T238399
  • 08:24 godog: silence codfw varnish traffic drop until dec 9th - T239039
  • 08:09 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:53 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:51 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:49 elukey: roll restart of eventstreams on scb2* - T239220
  • 07:41 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:15 vgutierrez: repooling cp3063 - T239310
  • 07:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3063.esams.wmnet
  • 07:04 vgutierrez: depool & powercycle cp3063 - T239310
  • 07:03 marostegui: Compress tables on db1102:3314
  • 06:52 marostegui: Remove db2062 from tendril and zarcillo - T238726
  • 06:50 marostegui: Stop MySQL on db2062 - T238726
  • 06:25 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 06:05 marostegui: Promote db2135 to codfw m5 master T238183
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2135 to the config T238183 (duration: 00m 59s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2135 to the config T238183 (duration: 01m 11s)
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2125 T239042', diff saved to https://phabricator.wikimedia.org/P9759 and previous config saved to /var/cache/conftool/dbconfig/20191127-054809-marostegui.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9758 and previous config saved to /var/cache/conftool/dbconfig/20191127-054056-marostegui.json
  • 01:58 krinkle@deploy1001: Synchronized vendor: 4108ff4e2 (3/3) (duration: 01m 00s)
  • 01:56 krinkle@deploy1001: Synchronized wmf-config/: 4108ff4e2 (2/3) (duration: 00m 59s)
  • 01:55 krinkle@deploy1001: Synchronized lib/: 4108ff4e2 (1/3) (duration: 01m 01s)
  • 01:28 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 03s)
  • 00:05 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Show UploadWizard CTA on testcommonswiki (T234960) (duration: 01m 00s)

2019-11-26

  • 23:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WelcomeSurvey for 100% of new users on arwiki (duration: 01m 02s)
  • 23:25 eileen: process-control config revision is ad80b0136c
  • 20:33 jforrester@deploy1001: Synchronized dblists/: Update dblists, now autogenerated (no-op, just comment changes) T223602 (duration: 01m 01s)
  • 20:25 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c282e86]: Followup on T230495 (duration: 00m 59s)
  • 20:24 ebernhardson@deploy1001: Finished deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3 (duration: 00m 42s)
  • 20:24 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c282e86]: Followup on T230495
  • 20:24 ebernhardson@deploy1001: Started deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3
  • 20:06 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495 (duration: 01m 23s)
  • 20:05 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495
  • 19:59 Pchelolo: create partitioned topics for cirrusSearchElasticaWrite on kafka-main T239135
  • 19:57 Urbanecm: Reset email of TheklanBot (T239233)
  • 19:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.8
  • 19:39 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache (duration: 32m 52s)
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9753 and previous config saved to /var/cache/conftool/dbconfig/20191126-192724-marostegui.json
  • 19:22 shdubsh: restore codfw logstash to baseline - T215904
  • 19:09 shdubsh: stop logstash codfw, generate some consumer lag, and set batch size to 2000 - T215904
  • 19:07 ebernhardson@deploy1001: Finished deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml (duration: 00m 29s)
  • 19:07 ebernhardson@deploy1001: Started deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml
  • 19:06 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache
  • 19:04 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.2 (duration: 07m 08s)
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 05s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 02s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 18:55 shdubsh: stop logstash codfw, generate some consumer lag - T215904
  • 18:44 shdubsh: temporarily update pipeline.batch.size to 1000 on logstash2004 - T215904
  • 18:33 shdubsh: stop logstash on logstash200[5-6] for metrics collection - T215904
  • 18:09 brennen: issues with branch.py branch cut; deleted stub wmf/1.35.0-wmf.8 branch and proceeding with standard process
  • 17:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Show UploadWizard CTA in beta (T234960) (duration: 00m 52s)
  • 17:31 brennen: cutting branch for 1.35.0-wmf.8
  • 17:26 paravoid: moving fiberring from cr3-esams:xe-0/0/2 to cr2-esams:xe-0/1/8
  • 17:25 ppchelko@deploy1001: Finished deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015 (duration: 15m 38s)
  • 17:10 ppchelko@deploy1001: Started deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015
  • 17:03 paravoid: above was for cr3-esams
  • 17:03 paravoid: cr2-esams: disable interface xe-0/0/2 (transit)
  • 16:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop Scribunto special-case for HHVM, never reached T235142 (duration: 00m 52s)
  • 16:32 jforrester@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: Drop HHVMRequestInit symlink creation (duration: 00m 52s)
  • 16:31 James_F: No sane way to delete HHVMRequestInit.php with a simple sync-dir, so waiting for the full scap.
  • 16:30 jforrester@deploy1001: Synchronized docroot/noc/conf/: Drop HHVMRequestInit symlink (duration: 00m 52s)
  • 16:27 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a (duration: 08m 37s)
  • 16:19 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a
  • 16:10 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685) (duration: 01m 07s)
  • 16:09 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685)
  • 16:01 ema: cp3050: temporarily disable request coalescing to assess performance impact T238494
  • 15:15 ema: cp3050: repool after failed test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ (reverted) T238494
  • 14:55 bblack: ignore previous message, restarts not necessary
  • 14:53 bblack: rolling through authdns daemon restarts (necessary to reconfigure ANY-address listener) on authdns1001, authdns2001, ganeti3003
  • 14:44 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Raise memory limit on parsoid servers 2/2 (duration: 00m 52s)
  • 14:42 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Raise memory limit on parsoid servers 1/2 (duration: 00m 51s)
  • 14:30 oblivian@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 14:05 ema: cp3050: depool to merge and test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ T238494
  • 13:11 effie: enable puppet on mediawiki servers
  • 13:03 effie: Remove tmpreaper package from all mediawiki servers - T229792
  • 12:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp (T238918) (duration: 00m 53s)
  • 12:07 XioNoX: power down mr1-esams for replacement - T238174
  • 11:36 elukey: reboot stat1007
  • 11:35 marostegui: Deploy schema change on db1139:3311
  • 11:35 effie: enable puppet on mw canary servers, and restart apaches
  • 10:50 hashar: Updated jenkins job operations-puppet-tests-stretch-docker to use latest Docker container
  • 10:30 godog: swift eqiad-prod: add ms-be105[7-9] - T237438
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9749 and previous config saved to /var/cache/conftool/dbconfig/20191126-102442-marostegui.json
  • 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:07 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:45 effie: Disable puppet on all mediawiki servers to test 489982
  • 09:26 marostegui: Deploy schema change on s8 primary master (db1109) - T234066 T233135 T237120
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into s8 vslow,dump', diff saved to https://phabricator.wikimedia.org/P9748 and previous config saved to /var/cache/conftool/dbconfig/20191126-092409-marostegui.json
  • 09:18 marostegui: Run maintain-views for wikidatawiki.protected_title view on labsdb hosts T233135
  • 07:53 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch Flow to Parsoid/PHP on mw.org -- T229015 (duration: 00m 52s)
  • 07:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266 (duration: 14m 24s)
  • 07:29 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266
  • 07:28 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions (duration: 07m 36s)
  • 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1061 from config - T238624', diff saved to https://phabricator.wikimedia.org/P9745 and previous config saved to /var/cache/conftool/dbconfig/20191126-071746-marostegui.json
  • 07:09 marostegui: Stop MySQL on db1061 - T238624
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1061 from config T238624 (duration: 00m 52s)
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1061 from config T238624 (duration: 00m 54s)
  • 06:51 marostegui: Run compare.py for db2125 - T239042
  • 06:44 marostegui: Remove triggers for ar_comment on db1124:3318 T234704
  • 06:43 marostegui: Deploy schema change on db1087 with replication, lag will be generated on s8 for labsdb hosts
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, and pool db1092 temporarily as vslow,dump for s8, for a schema change on db1087', diff saved to https://phabricator.wikimedia.org/P9744 and previous config saved to /var/cache/conftool/dbconfig/20191126-064200-marostegui.json
  • 06:34 XioNoX: Rename cr2-knams to cr3-knams - T237030
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1086 on s7 master and remove read-only from s7 T238044', diff saved to https://phabricator.wikimedia.org/P9743 and previous config saved to /var/cache/conftool/dbconfig/20191126-060108-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance T238044', diff saved to https://phabricator.wikimedia.org/P9742 and previous config saved to /var/cache/conftool/dbconfig/20191126-060023-marostegui.json
  • 06:00 marostegui: Starting s7 failover from db1062 to db1086 - T238044
  • 05:49 marostegui: Deploy schema change on dbstore1003:3311
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1086 as it will be the new s7 master - T238044', diff saved to https://phabricator.wikimedia.org/P9741 and previous config saved to /var/cache/conftool/dbconfig/20191126-051034-marostegui.json
  • 05:08 marostegui: Start pre-steps for s7 failover - T238044

2019-11-25

  • 23:39 cstone: payments-wiki revision changed from e4d51fe247 to 2eb54fd6ef
  • 23:14 Urbanecm: Evening SWAT done
  • 23:12 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 23:10 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 01s)
  • 23:09 urbanecm@deploy1001: Synchronized dblists/: SWAT: aed2369: Add gewikimedia to special.dblist (T239173) (duration: 00m 52s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: d71b0ab: kask-echoseen: Do not report dupes (T237143) (duration: 00m 53s)
  • 22:13 Jeff_Green: authdns update to deploy I21ddc1a3e
  • 22:04 eileen: civicrm revision changed from 852c4a36bd to 5cf2d2713f, config revision is c4ad2f5990
  • 20:37 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 20:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:07 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:05 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:04 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 19:35 mutante: mw1298 - scap pull
  • 19:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 19:30 ema@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet,service=nginx
  • 19:14 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 19:13 cdanis: restarted grafana-server on grafana1002 T220838
  • 19:11 cdanis: copied snapshot of database from grafana1001 to grafana1002 T220838
  • 19:07 cdanis: stopping grafana-next.wikimedia.org (on grafana1002)
  • 19:06 cdanis: making grafana.wikimedia.org read-only (on grafana1001) ✔️ cdanis@grafana1001.eqiad.wmnet ~ 🕑☕ sudo chmod -w /var/lib/grafana/grafana.db
  • 18:56 Lucas_WMDE: Morning SWAT done
  • 18:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/TemplateData/: SWAT: Implement ParsoidFetchTemplateData hook for Parsoid/PHP (T238954) (duration: 00m 53s)
  • 18:54 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 18:54 ema: cumin -b1 'A:cp-ats and A:esams' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:eqsin' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:ulsfo' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:52 ema: cumin -b1 'A:cp-ats and A:codfw' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:51 ema: cumin -b1 'A:cp-ats and A:eqiad' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:50 bblack: cp[245]*: wipe daemon.log and restart syslog, again
  • 18:48 mutante: mw1298 - pooling
  • 18:26 bblack: cp[245]*: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:17 bblack: cp4028: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:16 effie: Restart php-fpm on mw* and wtp* servers in eqiad and codfw - T236963
  • 18:07 effie: Upgrade php-wikidiff2 to 1.10.0 to all servers - T236963
  • 17:55 gehel: restart wdqs-updater on all wdqs servers
  • 17:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates (duration: 10m 24s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch private wiki clients (Flow, VE) to Parsoid/PHP -- T229015 (duration: 00m 53s)
  • 17:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates
  • 17:36 marostegui: Upgrade kernel on db2125 T239042
  • 17:25 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates (duration: 12m 23s)
  • 17:19 XioNoX: power down cr2-knams - T237030
  • 17:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa (duration: 08m 58s)
  • 17:12 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates
  • 17:05 arlolra@deploy1001: Started deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa
  • 16:48 jynus: upgrading and restarting dbprov* hosts
  • 15:49 ema: pool cp3064 with varnish-be T227432
  • 15:36 ema: cp3064 create filesystem on /dev/nvme0n1p1 (see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552547/) and reboot T238494
  • 15:22 ema: cp3064 manual reboot after wmf-auto-reimage error: 'Unable to run wmf-auto-reimage-host: Failed to reboot_host' T238494
  • 15:20 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:18 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:14 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 ema: cp1075: ats-tls-restart to enable lua reload T233274
  • 15:10 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:09 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 ema: cp1075: ats-backend-restart to enable lua reload T233274
  • 15:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 15:00 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp3056.esams.wmnet,service=ats-be
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:50 XioNoX: enable cr3-esams:et-1/0/0 - T236767
  • 14:45 ema: depool cp3064 and reimage with varnish-be T227432
  • 14:44 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 14:38 marostegui: Remove triggers from archive table on s1 codfw sanitarium T234704
  • 14:37 marostegui: Deploy schema change on s1 codfw (this will generate lag on codfw) - T234066 T233135
  • 14:23 moritzm: upgrading OpenJDK 11 on an-conf*
  • 14:04 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 elukey: set global read_only=1 on db1108's log database - T159170
  • 13:16 XioNoX: cleanup config on cr3-esams - T237031
  • 13:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:11 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:06 XioNoX: cleanup config on cr2-esams - T237031
  • 13:02 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:59 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:48 XioNoX: bundle esams-knams links on knams side - T237031
  • 12:42 XioNoX: bundle esams-knams links on esams side - T237031
  • 12:27 XioNoX: disable BGP to knams transits - T237031
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Increase main traffic weight for db1126', diff saved to https://phabricator.wikimedia.org/P9735 and previous config saved to /var/cache/conftool/dbconfig/20191125-114821-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P9734 and previous config saved to /var/cache/conftool/dbconfig/20191125-114733-marostegui.json
  • 11:40 effie: cumin -b 2 -s 10 restart php on API servers
  • 11:31 effie: restart php-fpm on mw1314
  • 11:16 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/AbuseFilter/extension.json: SWAT: 29a16bd: Restrict viewing Special:Log/AbuseFilter, and remove from recent changes (T34959) (duration: 01m 04s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 4670d1d: Add throttle rule for WMCL Editathon 2019-12-07 (T238986) (duration: 00m 53s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9394f1f: Allow enwikiversity interface admins to remove their own interface administratorship (T238967) (duration: 00m 57s)
  • 09:45 moritzm: installing cron updates from buster point release
  • 09:32 moritzm: installing systemd security/bugfix updates on buster
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - schema change', diff saved to https://phabricator.wikimedia.org/P9732 and previous config saved to /var/cache/conftool/dbconfig/20191125-093157-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P9731 and previous config saved to /var/cache/conftool/dbconfig/20191125-093038-marostegui.json
  • 09:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: T238822 (duration: 13m 08s)
  • 09:28 _joe_: building and publishing updated images for envoy
  • 09:17 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: T238822
  • 09:13 moritzm: installing python2.7 updates on buster
  • 08:53 _joe_: rebuilding base docker images docker-registry.wikimedia.org/wikimedia-{jessie,stretch,buster}
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 marostegui: Compress db2090
  • 07:04 marostegui: Upgrade db2134
  • 06:24 marostegui: Compress db2080
  • 06:23 marostegui: Compress db2082
  • 06:22 marostegui: Compress db2094:3318
  • 06:18 marostegui: racadm serveraction hardreset on db2125 T239042
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 - schema change', diff saved to https://phabricator.wikimedia.org/P9730 and previous config saved to /var/cache/conftool/dbconfig/20191125-061629-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9729 and previous config saved to /var/cache/conftool/dbconfig/20191125-061542-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9728 and previous config saved to /var/cache/conftool/dbconfig/20191125-060728-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9727 and previous config saved to /var/cache/conftool/dbconfig/20191125-060011-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed T239042', diff saved to https://phabricator.wikimedia.org/P9726 and previous config saved to /var/cache/conftool/dbconfig/20191125-055813-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9725 and previous config saved to /var/cache/conftool/dbconfig/20191125-055305-marostegui.json
  • 03:13 vgutierrez: repooling cp3053 - T239041
  • 03:00 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3053.esams.wmnet
  • 02:59 vgutierrez: depooling & power-cycling cp3053 - T239041
  • 00:10 eileen: also speed the repair process-control config revision is c4ad2f5990

2019-11-24

  • 20:54 eileen: process-control config revision is 371782a667
  • 15:41 ariel@deploy1001: Finished deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps (duration: 00m 03s)
  • 15:41 ariel@deploy1001: Started deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps
  • 15:01 apergos: rebooting dumpsdata1002 to clear up the other half of the nfs issues
  • 14:24 apergos: rebooting snapshot1008 to clear up some nfs + kernel issues

2019-11-23

  • 18:19 gehel: repool wdqs1007, catched up on lag - T238229
  • 14:23 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 55s)
  • 11:56 _joe_: oblivian@cumin1001:~$ sudo cumin -b2 -s60 A:mw-eqiad 'restart-php7.2-fpm'
  • 11:47 _joe_: restarting php7.2-fpm on mw1329
  • 09:49 XioNoX: downtime all ripe-atlas checks until Monday (most likely an upstream issue/maintenance)

2019-11-22

  • 21:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238955 (duration: 00m 53s)
  • 18:02 shdubsh: restore prometheus services default settings - T238807
  • 17:52 _joe_: repooling restbase2018
  • 17:36 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:34 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 shdubsh: clean tombstones on prometheus1004 - T238807
  • 17:09 shdubsh: restart prometheus on prometheus1004 - T238807
  • 16:22 shdubsh: clean tombstones on prometheus1003 - T238807
  • 15:40 XioNoX: renumber AS17639 sessions in eqsin
  • 15:16 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/: Stop outputting anything in case of 304 responses in Special:EntityData (T238901) (duration: 00m 57s)
  • 14:49 _joe_: disabling puppet on restbase2018, testing envoy upgrade T238050
  • 14:48 _joe_: uploaded envoyproxy 1.12.1 to {buster,stretch} T237235
  • 13:11 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T238119 T238524 T237375 T238120)
  • 13:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: T238473 (duration: 00m 52s)
  • 12:34 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)
  • 12:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)
  • 11:59 effie: reload php7 on canaries
  • 11:34 effie: Roll out wikidiff2 1.10.0-1 to canaries - T236963
  • 11:29 effie: upload wikidiff2 1.10.0-1 - T236963
  • 09:59 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 10s)
  • 09:56 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 51s)
  • 09:47 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
  • 09:44 ladsgroup@deploy1001: Synchronized langlist: T238104 T238104 (duration: 00m 52s)
  • 09:28 ema: pool cp1081 with ATS backend T227432
  • 09:27 gehel: depool wdqs1007 to allow to catch up on lag - T238229
  • 09:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/includes/specials/pagers/ContribsPager.php: Remove live hack of limit for T234450 (duration: 00m 54s)
  • 09:19 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T234450 (duration: 00m 55s)
  • 09:07 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:04 gehel: remove blazegraph 2.1.5-wmf.11 from archiva, broken upload
  • 08:54 gehel: restarting blazegraph and updater on wdqs1007
  • 08:54 gehel: restarting blazegraph and updater on edqs1007
  • 08:49 ema: depool cp1081 and reimage as text_ats T227432
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Rebalance weights on s7 in preparation for s7 failover on Tuesday T238044', diff saved to https://phabricator.wikimedia.org/P9722 and previous config saved to /var/cache/conftool/dbconfig/20191122-063145-marostegui.json
  • 03:49 shdubsh: restart prometheus@ops on prometheus1003 T238807
  • 00:46 mutante: xhgui1001/xhgui2001 - rsyncing /srv/mongod from tungsten to /srv/tungsten/mongod/ on both new machines (T158837)
  • 00:37 mutante: tungsten - starting ferm service
  • 00:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move newcomer tasks JSON config from mw.org to local wikis (T237301) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Make non-remote titles work in RemotePageConfigurationLoader (T237301) (duration: 00m 54s)

2019-11-21

  • 23:09 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused CirrusSearch config variable (duration: 00m 52s)
  • 22:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --overwrite --user=Bürgerentscheid . (T238764)
  • 21:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Revert "Add Machine Vision CTA to final step (T234960)", take 2 (duration: 00m 41s)
  • 21:36 mholloway-shell@deploy1001: Scap failed!: 5/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:34 mholloway-shell@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:29 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Add Machine Vision CTA to final step (T234960) (duration: 00m 59s)
  • 21:16 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88 (duration: 06m 29s)
  • 21:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88
  • 20:51 mutante: puppetmaster1001 - revoking puppet certs for xhgui1001/xhgui2001
  • 20:49 mutante: ganeti1003 - switching boot order of xhgui1001 to network and reinstalling with stretch (T238098)
  • 20:16 mforns@deploy1001: Finished deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist (duration: 08m 29s)
  • 20:14 mutante: icinga1001 - systemctl reset-failed
  • 20:08 mforns@deploy1001: Started deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist
  • 19:01 andrewbogott: upgrading designate to 'ocata' on cloudservices1003 and 1004
  • 18:49 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:45 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:13 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis back to Parsoid/JS - T229015 (duration: 00m 52s)
  • 18:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:02 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Use HTTPS for contacting Parsoid/PHP - T229015 (duration: 00m 53s)
  • 17:52 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Switch private wikis to Parsoid/PHP; file 4/4 -- T229015 (duration: 00m 53s)
  • 17:51 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis to Parsoid/PHP; file 3/4 -- T229015 (duration: 00m 51s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch private wikis to Parsoid/PHP; file 2/4 -- T229015 (duration: 00m 53s)
  • 17:48 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: Switch private wikis to Parsoid/PHP; file 1/4 -- T229015 (duration: 00m 53s)
  • 17:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015 (duration: 16m 43s)
  • 17:10 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015
  • 17:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP (duration: 02m 38s)
  • 17:06 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP
  • 16:54 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 16:48 sbassett@deploy1001: Finished scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues. (duration: 16m 42s)
  • 16:31 sbassett@deploy1001: Started scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues.
  • 15:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 15:42 mforns@deploy1001: Finished deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107) (duration: 10m 50s)
  • 15:31 mforns@deploy1001: Started deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107)
  • 15:30 ema: pool cp1079 with ATS backend T227432
  • 15:22 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:19 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:13 akosiaris: purge https://releases.wikimedia.org/charts/eventgate-0.0.13.tgz, https://releases.wikimedia.org/charts/ and https://releases.wikimedia.org/charts/index.yaml
  • 15:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 bblack: DONE testing deployment software changes on authdns cluster, back to normal
  • 15:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 ema: depool cp1079 and reimage as text_ats T227432
  • 14:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: Agent filter changes (duration: 18m 33s)
  • 14:43 bblack: testing deployment software changes on authdns cluster, please hold dns changes for a few!
  • 14:41 thcipriani: restarting Jenkins for update
  • 14:28 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: Agent filter changes
  • 13:59 ema: pool cp1077 with ATS backend T227432
  • 13:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 ema: depool cp1077 and reimage as text_ats T227432
  • 11:53 reedy@deploy1001: Finished scap: T234450 (duration: 19m 20s)
  • 11:42 effie: enable puppet on all mw hosts
  • 11:33 reedy@deploy1001: Started scap: T234450
  • 11:09 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e4861ec: Set correct language for shywiktionary (T238105) (duration: 00m 52s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 68d2003: Restrict editing CNBanner namespace to autoconfirmed on metawiki (T238723) (duration: 00m 54s)
  • 11:05 effie: disable puppet on mw[1-2]*
  • 10:49 volans: restarting tcpircbot-logmsgbot on icinga1001, has failed to log some messages, no useful log on the host
  • 10:22 ema: pool cp2023 with Varnish backend T238817 T227432
  • 10:18 arturo: update buster-wikimedia thirdparty/kubeadm-k8s packages (newer version will be used to handle T238654)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9714 and previous config saved to /var/cache/conftool/dbconfig/20191121-095401-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9713 and previous config saved to /var/cache/conftool/dbconfig/20191121-093958-marostegui.json
  • 09:39 ema: depool cp2023 and reimage back as varnish-be T238817 T227432
  • 09:38 marostegui: Stop MySQL on db1067 - T238297
  • 09:27 marostegui: Upgrade db1090:3312, db1090:3317
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P9712 and previous config saved to /var/cache/conftool/dbconfig/20191121-092554-marostegui.json
  • 09:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9711 and previous config saved to /var/cache/conftool/dbconfig/20191121-090623-marostegui.json
  • 09:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 08:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9710 and previous config saved to /var/cache/conftool/dbconfig/20191121-085644-marostegui.json
  • 08:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9709 and previous config saved to /var/cache/conftool/dbconfig/20191121-084500-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9708 and previous config saved to /var/cache/conftool/dbconfig/20191121-083322-marostegui.json
  • 08:21 marostegui: Upgrade db1079
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for upgrade', diff saved to https://phabricator.wikimedia.org/P9707 and previous config saved to /var/cache/conftool/dbconfig/20191121-082108-marostegui.json
  • 07:57 akosiaris: upgrade OTRS to 5.0.39 T225925
  • 07:56 marostegui: Promote db2133 to codfw m2 master - T238183
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9706 and previous config saved to /var/cache/conftool/dbconfig/20191121-072543-marostegui.json
  • 07:18 marostegui: Upgrade db1125 (sanitarium)
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9705 and previous config saved to /var/cache/conftool/dbconfig/20191121-071758-marostegui.json
  • 06:56 marostegui: Repool labsdb1009
  • 06:32 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db1124:3313 T238115 T238114 T237373 T238522 T236404
  • 06:30 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db2094:3313 T238115 T238114 T237373 T238522 T236404
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9704 and previous config saved to /var/cache/conftool/dbconfig/20191121-062412-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9703 and previous config saved to /var/cache/conftool/dbconfig/20191121-061711-marostegui.json
  • 06:16 marostegui: Compress db2081
  • 06:13 marostegui: Stop MySQL on db1107 T238113
  • 06:06 marostegui: Compress db2083
  • 05:57 marostegui: Depool labsdb1009 for upgrade
  • 05:56 marostegui: Upgrade db1086
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for upgrade', diff saved to https://phabricator.wikimedia.org/P9702 and previous config saved to /var/cache/conftool/dbconfig/20191121-055557-marostegui.json
  • 05:53 marostegui: Compress db2073
  • 00:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config does not seem to be applying on half the app servers, resyncing (duration: 00m 52s)
  • 00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable suggested edits without opt-in (T227728) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Finished scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n) (duration: 15m 57s)
  • 00:02 catrope@deploy1001: Started scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n)

2019-11-20

  • 23:14 Amir1: finished creating five wikis, total duration 134 minutes
  • 23:14 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
  • 23:11 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 50s)
  • 23:10 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238105 (duration: 00m 52s)
  • 23:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238105 (duration: 00m 51s)
  • 23:08 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238105 (duration: 00m 51s)
  • 23:05 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238105
  • 22:59 ladsgroup@deploy1001: Synchronized dblists: T238105 (duration: 00m 53s)
  • 22:49 ladsgroup@deploy1001: Synchronized langlist: T238104 (duration: 00m 51s)
  • 22:48 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238104 (duration: 00m 52s)
  • 22:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238104 (duration: 00m 52s)
  • 22:43 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238104 (duration: 00m 51s)
  • 22:41 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238104
  • 22:36 ladsgroup@deploy1001: Synchronized dblists: T238104 (duration: 00m 52s)
  • 22:22 ladsgroup@deploy1001: Synchronized langlist: T237369 (duration: 00m 53s)
  • 22:21 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T237369 (duration: 00m 52s)
  • 22:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237369 (duration: 00m 51s)
  • 22:17 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T237369 (duration: 00m 51s)
  • 22:15 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T237369
  • 22:11 ladsgroup@deploy1001: Synchronized dblists: T237369 (duration: 00m 52s)
  • 22:00 Urbanecm: Wiki creation continues
  • 21:56 ladsgroup@deploy1001: Synchronized langlist: T236861 (duration: 00m 52s)
  • 21:55 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T236861 (duration: 00m 51s)
  • 21:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236861 (duration: 00m 52s)
  • 21:52 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T236861 (duration: 00m 51s)
  • 21:49 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T236861
  • 21:44 ladsgroup@deploy1001: Synchronized dblists: T236861 (duration: 00m 52s)
  • 21:38 Urbanecm: mwscript createAndPromote.php --wiki=gewikimedia --sysop --bureaucrat Mehman97 <password redacted> (T236389)
  • 21:35 gehel: repool wdqs1004 - T238229
  • 21:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:29 urbanecm@deploy1001: Synchronized static/images/project-logos/: new wiki gewikimedia (T236389) (duration: 00m 53s)
  • 21:28 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:27 ejegg: Fundraising CiviCRM updated from 2802bdd649 to 852c4a36bd
  • 21:23 mutante: notebook1003 - systemctl start nagios-nrpe-server (second time today already today T212824)
  • 21:20 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: new wiki gewikimedia (T236389)
  • 21:16 urbanecm@deploy1001: Synchronized dblists: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:01 ssastry@deploy1001: Finished deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix (duration: 07m 20s)
  • 20:53 ssastry@deploy1001: Started deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix
  • 20:37 ssastry@deploy1001: Finished deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d (duration: 09m 14s)
  • 20:27 ssastry@deploy1001: Started deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d
  • 20:27 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:23 mutante: notebook1003 - sudo systemctl nagios-nrpe-server (as usual ....)
  • 20:19 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:31 ejegg: updated fundraising internal dashboard from 69fdbec60d to 8fc2726736
  • 19:04 mutante: xhgui1001 - initial puppet run, signed puppet cert on puppetmaster1001
  • 18:56 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 50s)
  • 18:51 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 54s)
  • 18:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 170 (duration: 00m 53s)
  • 18:31 mutante: ganeti - introducing and installing buster on new VMs xhgui1001/xhgui2001 - for replacing tungsten (jessie) T238098
  • 18:17 mobrovac: morning SWAT done
  • 18:17 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.5/includes/libs/virtualrest/ParsoidVirtualRESTService.php: Parsoid VRS: Add the Host header - T229015 T229078 T229074 (duration: 00m 52s)
  • 18:13 shdubsh: restart mtail on fermium
  • 17:40 ema: pool cp2023 with ATS backend T227432
  • 17:24 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:21 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:19 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 17:18 andrewbogott: upgrading pdns to version 4 on cloudservices1003
  • 17:06 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:04 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:03 andrewbogott: upgrading pdns to version 4 on cloudvirt1004 T210715
  • 16:58 andrewbogott: disabling puppet on cloudvirt1003 and 1004 for T210715
  • 16:55 moritzm: installing rpcbind bugfix updates from buster 10.2 point release
  • 16:43 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:23 ema: depool cp2023 and reimage as text_ats T227432
  • 16:14 ema: pool cp2019 with ATS backend T227432
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9695 and previous config saved to /var/cache/conftool/dbconfig/20191120-160813-marostegui.json
  • 16:03 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 15:42 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: [BETA-ONLY] Switch Flow to use Parsoid/PHP - T229078 (duration: 00m 52s)
  • 15:40 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 15:19 ema: depool cp2019 and reimage as text_ats T227432
  • 15:08 gehel: reset LVS weight for wdqs public eqiad to 10
  • 15:05 effie: Enable puppet on mw*
  • 14:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 14:50 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (use altered lag, not raw lag) gerrit:552072 (duration: 00m 53s)
  • 14:49 ema: pool cp2016 with ATS backend T227432
  • 14:47 effie: disable puppet on all mw* servers
  • 14:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:06 ema: depool cp2016 and reimage as text_ats T227432
  • 13:32 godog: updated puppet compiler facts on compiler100* hosts
  • 12:43 ema: pool cp2013 with ATS backend T227432
  • 12:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:08 ema: depool cp2013 and reimage as text_ats T227432
  • 11:59 ema: pool cp2012 with ATS backend T227432
  • 11:55 Urbanecm: EU SWAT done
  • 11:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2b13fbe: [rowiki] Enable deleterevision for patrollers (T234051) (duration: 00m 52s)
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 51ecd71: Partial cleanup of InitializeSettings (T231178) (duration: 00m 52s)
  • 11:42 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f847380: Set namespace alias for Index: (NS 102/103) for elwikisource (T237253) (duration: 00m 54s)
  • 11:36 urbanecm@deploy1001: Finished scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes (duration: 06m 15s)
  • 11:30 urbanecm@deploy1001: Started scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes
  • 11:27 ema: cp2010: ats-backend-restart to clear backend restart alert
  • 11:21 ema: depool cp2012 and reimage as text_ats T227432
  • 11:15 ema: pool cp2010 with ATS backend T227432
  • 10:54 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716 (duration: 13m 56s)
  • 10:34 ema: depool cp2010 and reimage as text_ats T227432
  • 10:30 marostegui: Upgrade db1116
  • 10:22 mobrovac@deploy1001: Started deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P9694 and previous config saved to /var/cache/conftool/dbconfig/20191120-101727-marostegui.json
  • 10:14 marostegui: Compress db2095:3314
  • 10:07 mobrovac@deploy1001: Finished deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716 (duration: 14m 54s)
  • 09:56 marostegui: Compress db2106
  • 09:52 mobrovac@deploy1001: Started deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716
  • 09:48 marostegui: Compress dbstore1005:3318
  • 09:47 marostegui: Compress dbstore1004:3314
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9693 and previous config saved to /var/cache/conftool/dbconfig/20191120-093308-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9692 and previous config saved to /var/cache/conftool/dbconfig/20191120-092337-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9691 and previous config saved to /var/cache/conftool/dbconfig/20191120-090739-marostegui.json
  • 08:55 marostegui: Upgrade db1094
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for upgrade', diff saved to https://phabricator.wikimedia.org/P9690 and previous config saved to /var/cache/conftool/dbconfig/20191120-085448-marostegui.json
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:43 marostegui: Promote db2132 as m1-codfw master - T238183
  • 07:19 marostegui: Upgrade db2062
  • 07:19 marostegui: Upgrade db2078
  • 07:14 marostegui: Deploy schema change on s3 (testwikidatawiki) directly on s3 primary master T237120
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P9688 and previous config saved to /var/cache/conftool/dbconfig/20191120-070511-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1136', diff saved to https://phabricator.wikimedia.org/P9687 and previous config saved to /var/cache/conftool/dbconfig/20191120-065718-marostegui.json
  • 06:44 marostegui: Upgrade db2118 (s7 codfw master)
  • 06:41 marostegui: Repool labsdb1011
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1136 into s7 api', diff saved to https://phabricator.wikimedia.org/P9686 and previous config saved to /var/cache/conftool/dbconfig/20191120-064022-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136 after upgrade', diff saved to https://phabricator.wikimedia.org/P9685 and previous config saved to /var/cache/conftool/dbconfig/20191120-063628-marostegui.json
  • 06:28 marostegui: Upgrade db1136
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for upgrade', diff saved to https://phabricator.wikimedia.org/P9684 and previous config saved to /var/cache/conftool/dbconfig/20191120-062749-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after upgrade', diff saved to https://phabricator.wikimedia.org/P9683 and previous config saved to /var/cache/conftool/dbconfig/20191120-062029-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9682 and previous config saved to /var/cache/conftool/dbconfig/20191120-061938-marostegui.json
  • 05:58 marostegui: Stop MySQL on db1101:3317, db1101:3318 for upgrade and schema change
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for upgrade and schema change', diff saved to https://phabricator.wikimedia.org/P9681 and previous config saved to /var/cache/conftool/dbconfig/20191120-055732-marostegui.json
  • 05:55 marostegui: Depool labsdb1011 for upgrade
  • 05:54 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1105:3311 db1097:3314 db1098:3316 db1098:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9680 and previous config saved to /var/cache/conftool/dbconfig/20191120-055426-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P9679 and previous config saved to /var/cache/conftool/dbconfig/20191120-054840-marostegui.json
  • 03:16 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php kowiki --cutoff 350
  • 02:57 vgutierrez: restarting pybal on lvs2002
  • 02:54 vgutierrez: restarting pybal on lvs2005
  • 02:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 02:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 00:10 mutante: phab2001 - restart ssh-phab service after repooling it after buster reinstall, it wasn't listening on the IPv6 IP,causing LVS/pybal alerts
  • 00:06 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Pass token as editing_session_id for suggested edits (T238249) (duration: 00m 53s)
  • 00:02 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 52s)
  • 00:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikiEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 54s)

2019-11-19

  • 23:58 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MobileFrontend/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 53s)
  • 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikimediaEvents/: EditAttemptStep: Allow other extensions to trigger oversampling (T238249) (duration: 00m 53s)
  • 23:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 21:45 XioNoX: rebooting pfw3-codfw:node1 for upgrade - T235150
  • 21:14 XioNoX: rebooting pfw3-codfw for upgrade - T235150
  • 20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:17 gehel: completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 20:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:10 XioNoX: homer push on mgmt routers
  • 20:09 mutante: phab1003 after merging gerrit:551910 puppet now also stopped the actual aphlict service and removed the systemd unit file. had to manually run 'systemctl reset-failed' though to clean systemd status and avoid icinga alert (T238593)
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:18 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 19:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286) (duration: 06m 49s)
  • 19:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286)
  • 19:00 elukey: regenerate TLS cert for yarn.wikimedia.org (containing SANs for all analytics UIs) to add datasets.w.o SAN (site was failing due to ATS not being able to contact thorium)
  • 18:59 rlazarus: restarted php7.2-fpm on wtp2001, wtp2002
  • 18:56 rlazarus: restarted php7.2-fpm on wtp1025, wtp1026
  • 18:35 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: Unbreak instrumentation of init events (duration: 00m 53s)
  • 18:34 ssastry@deploy1001: Finished deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7 (duration: 02m 04s)
  • 18:32 ssastry@deploy1001: Started deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7
  • 18:30 mutante: icinga config - manually added team-dcops, started icinga
  • 18:20 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, hook) gerrit:551858 (duration: 00m 53s)
  • 18:12 RoanKattouw: That was eowiktionary, not eowikisource
  • 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure default search namespaces for eowikisource (T237792) (duration: 00m 52s)
  • 17:43 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, maint script) gerrit:551857 (duration: 00m 52s)
  • 17:39 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:11 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag) gerrit:551855 gerrit:551856 (duration: 00m 54s)
  • 17:02 volker-e@deploy1001: Finished deploy [design/style-guide@d73818a]: Deploy design/style-guide: (duration: 00m 07s)
  • 17:02 volker-e@deploy1001: Started deploy [design/style-guide@d73818a]: Deploy design/style-guide:
  • 16:58 ema: pool cp2007 with ATS backend T227432
  • 16:30 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 moritzm: installing glib2.0 security updates
  • 16:21 mutante: phab1003 - puppet restarts aphlict service even with "phabricator_aphlict_enabled: false" in Hiera. But it does properly remove the proxy config lines from apache. so service is running but not used. (T238593)
  • 16:17 mutante: phab1003 - systemctl stop aphlict (proxy config in apache is disabled as well as disabled in ATS) (T238593)
  • 16:15 gehel: reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 16:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:10 ema: depool cp2007 and reimage as text_ats T227432
  • 16:09 ema: pool cp2006 with ATS backend T227432
  • 15:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure (duration: 02m 11s)
  • 15:57 mobrovac@deploy1001: Started deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure
  • 15:37 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:34 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015 (duration: 14m 22s)
  • 15:15 ema: depool cp2006 and reimage as text_ats T227432
  • 15:13 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015
  • 15:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP (duration: 02m 58s)
  • 15:07 ema: pool cp2004 with ATS backend T227432
  • 15:06 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP
  • 14:38 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:34 gehel: restarting blazegraph with additional logging on wdqs1004 - T231411
  • 14:18 ema: depool cp2004 and reimage as text_ats T227432
  • 14:13 ema: pool cp2001 with ATS backend T227432
  • 13:57 marostegui: Deploy schema change on metawiki directly on s7 master T238370
  • 13:57 marostegui: Deploy schema change on mediawikiwiki directly on s7 master T238370
  • 13:55 marostegui: Deploy schema change on mediawikiwiki directly on s3 master T238370
  • 13:50 marostegui: Deploy schema change on foundationwiki directly on s3 master - T238370
  • 13:46 marostegui: Deploy schema change on labswiki (wikitech) - T238370
  • 13:39 marostegui: Deploy schema change on db1092
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P9673 and previous config saved to /var/cache/conftool/dbconfig/20191119-133850-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9672 and previous config saved to /var/cache/conftool/dbconfig/20191119-133704-marostegui.json
  • 13:34 ema@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:33 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:14 ema: depool cp2001 and reimage as text_ats T227432
  • 12:42 jbond42: add libapache2-mod-auth-cas 1.2-1 to stretch-wikimedia repo
  • 12:28 effie: enable puppet on P:mediawiki::php and *.eqiad.wmnet
  • 12:22 effie: enable puppet on P:mediawiki::php and *.codfw.wmnet
  • 12:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 12:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 11:41 gehel: depooling wdqs1004 - T231411
  • 11:37 gehel: restarting wdqs blazegraph on wdqs1004 - T231411
  • 11:29 marostegui: Upgrade dbstore1003 (3311,3315,3317)
  • 11:16 gehel: restarting wdqs updater on wdqs1004 - T231411
  • 10:36 marostegui: Compress and upgrade db1098:3316
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9671 and previous config saved to /var/cache/conftool/dbconfig/20191119-103540-marostegui.json
  • 10:34 marostegui: Compress and upgrade db1098:3317
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9670 and previous config saved to /var/cache/conftool/dbconfig/20191119-103426-marostegui.json
  • 10:29 marostegui: Upgrade db2077
  • 10:24 marostegui: Upgrade db2120 db2121 db2122
  • 10:10 marostegui: Upgrade MySQL on db2086 db2087 db2100
  • 10:06 godog: repool centrallog2001
  • 09:40 effie: disable puppet on P:mediawiki::php - T229792
  • 09:21 moritzm: installing ncurses security updates
  • 09:20 moritzm: rolling restart of nginx on acmechief/puppetdb to pick up libxslt security updates
  • 09:08 moritzm: installing libxslt security updates
  • 09:08 marostegui: Deploy schema change on db1101:3318
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9669 and previous config saved to /var/cache/conftool/dbconfig/20191119-090823-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9668 and previous config saved to /var/cache/conftool/dbconfig/20191119-090745-marostegui.json
  • 09:05 marostegui: Repool labsbdb1010
  • 07:33 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Enable math links in Beta - T208758 (duration: 00m 53s)
  • 06:45 marostegui: Stop MySQL on db2061 T238526
  • 06:44 marostegui: Remove db2061 from tendril and zarcillo T238526
  • 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2061 from config T238526 (duration: 00m 52s)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2061 from config T238526 (duration: 00m 53s)
  • 06:26 vgutierrez: Move cp1089 from nginx to ats-tls - T231627
  • 06:20 marostegui: Depool labsdb1010 for upgrade
  • 06:02 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1131 to s6 master and remove read-only from s6 T235469', diff saved to https://phabricator.wikimedia.org/P9667 and previous config saved to /var/cache/conftool/dbconfig/20191119-060203-marostegui.json
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance T235469', diff saved to https://phabricator.wikimedia.org/P9666 and previous config saved to /var/cache/conftool/dbconfig/20191119-060122-marostegui.json
  • 06:01 marostegui: Starting s6 failover from db1061 to db1131 - T235469
  • 05:37 eileen: process control - I reverted the above to check some stuff first
  • 05:36 vgutierrez: Move cp1087 from nginx to ats-tls - T231627
  • 05:26 marostegui: Deploy schema change on db1099:3318
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9665 and previous config saved to /var/cache/conftool/dbconfig/20191119-052632-marostegui.json
  • 05:25 marostegui: Compress db1097:3314
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9664 and previous config saved to /var/cache/conftool/dbconfig/20191119-052412-marostegui.json
  • 05:17 vgutierrez: Move cp1085 from nginx to ats-tls - T231627
  • 05:14 marostegui: Compress tables on db1105:3311
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9663 and previous config saved to /var/cache/conftool/dbconfig/20191119-051344-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after compression', diff saved to https://phabricator.wikimedia.org/P9662 and previous config saved to /var/cache/conftool/dbconfig/20191119-051259-marostegui.json
  • 05:12 eileen: process-control config revision is 9fbfc79988 - change gap on repair job to 16 hours to reflect the with-daylight-savings ones
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T235469 ', diff saved to https://phabricator.wikimedia.org/P9661 and previous config saved to /var/cache/conftool/dbconfig/20191119-050748-marostegui.json
  • 05:02 marostegui: Start pre-switchover steps T235469
  • 04:47 vgutierrez: Move cp2023 from nginx to ats-tls - T231627
  • 04:17 vgutierrez: Move cp2019 from nginx to ats-tls - T231627
  • 03:53 vgutierrez: Move cp2016 from nginx to ats-tls - T231627
  • 03:51 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php cswiki --cutoff 350
  • 03:37 vgutierrez: Move cp2013 from nginx to ats-tls - T231627
  • 01:12 ejegg: re-enabled fundraising CiviCRM contact de-duplication jobs
  • 01:05 ejegg: disabled fundraising CiviCRM contact de-duplication jobs
  • 00:54 ejegg: updated civicrm from 1f454aa69a to 2802bdd649
  • 00:39 mutante: phab2001 - rsyncing /srv/repos data from phab1003 (T190568)
  • 00:30 mutante: rebooting phab2001

2019-11-18

  • 23:52 catrope@deploy1001: Finished scap: Update GrowthExperiments to master in wmf.5 (includes i18n) (duration: 19m 57s)
  • 23:37 mutante: phab2001 - restart ssh-phab service after reimaging (some race condition binding to the IP before getting it on the interface after fresh install .. reschedule pybal checks (T190568)
  • 23:32 catrope@deploy1001: Started scap: Update GrowthExperiments to master in wmf.5 (includes i18n)
  • 22:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001.codfw.wmnet
  • 22:39 eileen: civicrm revision changed from c05c302e54 to 1f454aa69a, config revision is 67685c12f5
  • 22:31 mutante: phab2001 - reinstalling with buster (T190568)
  • 21:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 arlolra: Upgraded Parsoid to 2245b8f (T237886, T237103, T236864, T237569, T236930, T237463, T236867, T234266)
  • 21:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 21:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f (duration: 08m 22s)
  • 21:39 arlolra@deploy1001: Started deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f
  • 20:59 mutante: phab1003 - re-enabling puppet after merging gerrit::551271 - making sure aphlict stays disabled incl. the apache config ProxyPass lines using mod_proxy_wstunnel (T238593)
  • 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after some compression', diff saved to https://phabricator.wikimedia.org/P9659 and previous config saved to /var/cache/conftool/dbconfig/20191118-202259-marostegui.json
  • 19:03 ejegg: updated payments-wiki from 30579d34d8 to 3f99ebecc7
  • 18:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater (duration: 13m 27s)
  • 18:07 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater
  • 17:44 cdanis: rebooting grafana1002 (currently test host not used in prod)
  • 17:08 marostegui: Deploy schema change on db1116:3318
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9658 and previous config saved to /var/cache/conftool/dbconfig/20191118-165410-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after compression', diff saved to https://phabricator.wikimedia.org/P9656 and previous config saved to /var/cache/conftool/dbconfig/20191118-164923-marostegui.json
  • 16:40 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕦 sudo -E reprepro --restrict grafana update buster-wikimedia
  • 16:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on remaining wikis for T198312 (duration: 00m 53s)
  • 14:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 13m 58s)
  • 14:34 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:34 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 02m 30s)
  • 14:31 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary (duration: 02m 45s)
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary
  • 14:27 arturo: imported openstack ocata deb packages into stretch-wikimedia/thirdpartdy/openstack-ocata-stretch (T238338)
  • 14:22 marostegui: Deploy schema change on dbstore1005:3318
  • 13:10 ema: cp-ats: rolling ats-{tls,backend} restart to apply log_buffer_size config changes T237608
  • 12:51 Urbanecm: Run mwscript recountCategories.php --wiki=cswiki --mode={subcats,pages,files} (T228585)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=files (T238500)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=pages (T238500)
  • 12:47 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=subcats (T238500)
  • 11:32 awight: EU SWAT complete
  • 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Cite: SWAT: Track pageviews only on content page views, not edits (T214493) (duration: 00m 51s)
  • 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Popups: SWAT: Don't record Popups actions on non-content pages (T214493) (duration: 00m 51s)
  • 11:04 moritzm: installing postgresql-common security updates
  • 10:56 moritzm: installing python-werkzeug security updates
  • 10:56 marostegui: Deploy schema change on db2078 (codfw master for wikidatawiki), this will create lag on s8 codfw - T237120
  • 10:53 moritzm: installing gdb updates from buster point release
  • 10:49 moritzm: installing python-cryptography bugfix updates from buster point release
  • 10:45 moritzm: updated buster netinst image for 10.2 T238519
  • 10:16 marostegui: Upgrade MySQL on labsdb1012
  • 09:33 godog: remove wezen from service, pending reimage
  • 09:11 marostegui: Remove ar_comment from triggers on db2094:3318 - T234704
  • 09:11 marostegui: Deploy schema change on s8 codfw, this will generate lag on s8 codfw - T233135 T234066
  • 09:03 marostegui: Restart MySQL on db1124 and db1125 to apply new replication filters T238370
  • 07:17 marostegui: Upgrade and restart mysql on sanitarium hosts on codfw to pick up new replication filters: db2094 and db2095 - T238370
  • 07:09 marostegui: Stop MySQL on db2070 to clone db2135 - T238183
  • 06:52 vgutierrez: Move cp1083 from nginx to ats-tls - T231627
  • 06:32 vgutierrez: Move cp1081 from nginx to ats-tls - T231627
  • 06:30 marostegui: Restart tendril mysql - T231769
  • 06:12 vgutierrez: Move cp2012 from nginx to ats-tls - T231627
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9652 and previous config saved to /var/cache/conftool/dbconfig/20191118-060508-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for compression', diff saved to https://phabricator.wikimedia.org/P9651 and previous config saved to /var/cache/conftool/dbconfig/20191118-060207-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2072, db2088:3311, db2087:3316, db2086:3317 after maintenances and schema changes', diff saved to https://phabricator.wikimedia.org/P9650 and previous config saved to /var/cache/conftool/dbconfig/20191118-060114-marostegui.json
  • 05:53 marostegui: Deploy schema change on s5 primary master db1100 - T233135 T234066
  • 03:40 vgutierrez: Move cp2007 from nginx to ats-tls - T231627
  • 00:44 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/PageHistoryCountHandler.php: fix extremely slow query T238378 (duration: 00m 59s)

2019-11-16

  • 20:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:25 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:17 effie: restart rsyslog on mw2221
  • 09:43 elukey: systemctl restart hadoop-* on analytics1077 after oom killer

2019-11-15

  • 22:14 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:31 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:29 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 _joe_: disabling proxying to ws on phabricator1003
  • 20:04 XioNoX: push pfw policies to pfw3-eqiad - T238368
  • 20:02 XioNoX: push pfw policies to pfw3-codfw - T238368
  • 19:07 XioNoX: remove vlan 1 trunking between msw1-codfw and mr1-codfw, will cause a quick connectivity issue - T228112
  • 18:07 XioNoX: homer push on management switches
  • 17:30 mutante: phabricator - -started phd service
  • 17:11 XioNoX: homer push to management routers (https://gerrit.wikimedia.org/r/550576)
  • 16:43 hashar: Restored zuul-merger / CI for operations/puppet.git
  • 16:29 hashar: CI slowed down due to a huge spike of internal jobs. Being flushed as of now # T140297
  • 16:25 bblack: repool cp2001
  • 16:08 bblack: depool cp2001 for experiments
  • 16:02 moritzm: rebooting rpki1001 to rectify microcode loading
  • 16:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:51 ejegg: updated Fundraising CiviCRM from ae9b3819cd to c05c302e54
  • 15:36 ejegg: reduced batch size of CiviCRM contact deduplication jobs
  • 15:11 ema: pool cp3064 with ATS backend T227432
  • 15:07 ema: reboot cp3064 after reimage
  • 14:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:49 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 ema: depool cp3064 and reimage as text_ats T227432
  • 14:17 godog: SIGHUP prometheus@ops on prometheus1004
  • 14:13 bblack: lvs1013 - pybal restart for new config
  • 14:13 bblack: lvs2001 - pybal restart for new config
  • 14:13 bblack: lvs5001 - pybal restart for new config
  • 14:13 bblack: lvs4005 - pybal restart for new config
  • 14:12 bblack: lvs3005 - pybal restart for new config
  • 14:11 bblack: lvs5003 - pybal restart for new config
  • 14:11 bblack: lvs4007 - pybal restart for new config
  • 14:11 bblack: lvs3007 - pybal restart for new config
  • 14:10 bblack: lvs2004 - pybal restart for new config
  • 14:09 bblack: lvs1016 - pybal restart for new config
  • 13:28 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 03s)
  • 13:28 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 13:06 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure) (duration: 00m 04s)
  • 13:06 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure)
  • 11:43 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 09s)
  • 11:43 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 11:27 moritzm: reboott ganeti4001-4003 to rectify microcode application
  • 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315 into vslow,dump after schema change', diff saved to https://phabricator.wikimedia.org/P9645 and previous config saved to /var/cache/conftool/dbconfig/20191115-112520-marostegui.json
  • 11:19 marostegui: Reboot dbproxy2002
  • 11:15 marostegui: Reboot dbproxy2004
  • 11:12 marostegui: Reboot dbproxy2001
  • 10:45 marostegui: Run maintain-views for s5 on labsdb1011 T233135
  • 10:38 moritzm: installing ghostscript security updates
  • 10:37 mobrovac: restbase - truncated parsoidphp data tables - T229015
  • 10:36 ema: pool cp3062 with ATS backend T227432
  • 10:24 godog: roll-restart logstash to apply configuration change
  • 10:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 ema: depool cp3062 and reimage as text_ats T227432
  • 09:47 vgutierrez: Use a synthetic warning for 1% of TLSv1/TLS1v.1 pageviews - T238038
  • 09:18 vgutierrez: Move cp1079 from nginx to ats-tls - T231627
  • 09:13 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 09:02 vgutierrez: Move cp1077 from nginx to ats-tls - T231627
  • 08:42 vgutierrez: Move cp2006 from nginx to ats-tls - T231627
  • 08:30 vgutierrez: Move cp2004 from nginx to ats-tls - T231627
  • 06:41 marostegui: Stop MySQL on db2065 to clone db2134 (this will trigger an haproxy irc alert) - T238183
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change and temporary pool db1082 into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9643 and previous config saved to /var/cache/conftool/dbconfig/20191115-060807-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9642 and previous config saved to /var/cache/conftool/dbconfig/20191115-060425-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 db1082 after schema changes', diff saved to https://phabricator.wikimedia.org/P9641 and previous config saved to /var/cache/conftool/dbconfig/20191115-060300-marostegui.json
  • 05:57 marostegui: Run maintain-views for s5 on labsdb1009, labsdb1010, labsdb1012 (pending labsdb1011 as it is still running the schema change) T233135
  • 05:07 vgutierrez: Move cp3064 from nginx to ats-tls - T231627
  • 04:38 volker-e@deploy1001: Finished deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:38 volker-e@deploy1001: Started deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide:
  • 04:17 vgutierrez: Move cp3062 from nginx to ats-tls - T231627
  • 04:00 vgutierrez: Move cp3060 from nginx to ats-tls - T231627
  • 01:35 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/CompareHandler.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 53s)
  • 01:33 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/coreRoutes.json: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 52s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/parser/Parser.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 54s)

2019-11-14

  • 23:03 mutante: restarting gerrit to ncrease defaultThreadPoolSize to 2
  • 22:29 eileen: civicrm revision changed from a3714003ff to ae9b3819cd, config revision is 6adc66a20b
  • 21:32 ssastry@deploy1001: Finished deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415 (duration: 08m 21s)
  • 21:24 ssastry@deploy1001: Started deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415
  • 21:14 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:06 cdanis@cumin2001: dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json
  • 20:04 gehel: reloading data on wdqs1004 from wdqs1007 to catch up on lag faster - T238229
  • 19:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:31 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:49 catrope@deploy1001: Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)
  • 18:37 catrope@deploy1001: Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)
  • 18:35 catrope@deploy1001: Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)
  • 18:34 mutante: scandium - restart php7.2-fpm
  • 18:31 mutante: phabricator (phab1003, prod server) - upgrade PHP version to 7.2.24 (T237239)
  • 18:17 cdanis@cumin2001: dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json
  • 17:46 robh: running dell epsa tool on cp3056 per T236497
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 ejegg: updated payments-wiki from bd907656fb to 30579d34d8
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 16:09 mutante: phab2001 - upgrading PHP version to 7.2.24 (T237239)
  • 16:06 mutante: scandium - upgrading PHP version to 7.2.24 (fyi, @subbu T228069) (T237239)
  • 16:04 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase: Put a layer of APC cache on top of reading wb_terms in SqlEntityInfoBuilder (T231011 T229407 T236681), Try II (duration: 00m 56s)
  • 14:54 ema: pool cp3060 with ATS backend T227432
  • 14:53 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix bug when when looking up entity for an unknown ID (duration: 00m 53s)
  • 14:48 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group1 for T198312 (duration: 00m 53s)
  • 14:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: depool cp3060 and reimage as text_ats T227432
  • 13:37 ladsgroup@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 13:35 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 13:06 bblack: removing digicert-2019 files from cache nodes - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/550829/
  • 12:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation (duration: 14m 52s)
  • 12:09 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation
  • 11:58 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation (duration: 02m 50s)
  • 11:55 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation
  • 11:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:48 vgutierrez: Rolling restart of ats-tls/ats-backend to upgrade to 8.0.5-1wm11 - T238307
  • 10:44 vgutierrez: uploaded trafficserver-8.0.5-1wm11 to apt.wikimedia.org (stretch) - T238307
  • 10:43 ema: pool cp3058 with ATS backend T227432
  • 10:25 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:20 godog: netbox1001 bandaid/symlink /srv/deployment/netbox/deploy/src/netbox/project-static to 'static'
  • 10:06 gehel: copying journal from wdqs1007 to wdqs1005 - T238232
  • 10:05 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 10:03 Urbanecm: Run deleteEqualMessages.php --delete for cswiki and viwiki
  • 09:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 gehel: depool wdqs (public) eqiad - high lag - T238229
  • 09:34 ema: depool cp3058 and reimage as text_ats T227432
  • 09:31 marostegui: Compare wikidatawiki.pagelinks between labsdb1011 and labsdb1010 - T233986
  • 09:25 moritzm: installing ghostscript updates on thumbor1001
  • 09:24 marostegui: Stop mysql on db2067 to clone db21133 - T238183
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Full weight to db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9635 and previous config saved to /var/cache/conftool/dbconfig/20191114-092006-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 marostegui: Compare wikidatawiki.pagelinks between db1124:3318 and labsdb1010 - T233986
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 marostegui: Remove ar_comment from triggers on db1124:3315 - T234704
  • 08:41 marostegui: Deploy schema change with replication on db1082, this will generate lag on s5 labs - T233135 T234066
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P9634 and previous config saved to /var/cache/conftool/dbconfig/20191114-084043-marostegui.json
  • 08:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P9633 and previous config saved to /var/cache/conftool/dbconfig/20191114-083729-marostegui.json
  • 08:03 eileen: process-control config revision is 6adc66a20b re-enable backfill
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool a non partitioned slave db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9632 and previous config saved to /var/cache/conftool/dbconfig/20191114-080038-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 T235599', diff saved to https://phabricator.wikimedia.org/P9631 and previous config saved to /var/cache/conftool/dbconfig/20191114-075449-marostegui.json
  • 07:41 eileen: process-control config revision is b7c2cf7227 - disabled backfill again - some error?
  • 07:29 eileen: process-control config revision is 909108622d re-enable omnirecipient date repair job
  • 07:25 eileen: process-control config revision is d3ebeddcc1 (I renabled the old back fill job)
  • 07:12 moritzm: installing intel-microcode updates
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1067', diff saved to https://phabricator.wikimedia.org/P9630 and previous config saved to /var/cache/conftool/dbconfig/20191114-065309-marostegui.json
  • 06:16 marostegui: Stop replication on db1067
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1083 to s1 master and remove read-only from s1 T234800', diff saved to https://phabricator.wikimedia.org/P9629 and previous config saved to /var/cache/conftool/dbconfig/20191114-060138-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T234800', diff saved to https://phabricator.wikimedia.org/P9628 and previous config saved to /var/cache/conftool/dbconfig/20191114-060026-marostegui.json
  • 06:00 marostegui: Starting s1 failover from db1067 to db1083 - T234800
  • 05:51 jynus: stopping db1114 replication
  • 05:34 marostegui: Compress db2089:3316 - T235599
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P9627 and previous config saved to /var/cache/conftool/dbconfig/20191114-052400-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P9626 and previous config saved to /var/cache/conftool/dbconfig/20191114-052303-marostegui.json
  • 05:13 marostegui: Move replicas from db1067 to db1083 T234800
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1083 with weight 0 T234800', diff saved to https://phabricator.wikimedia.org/P9625 and previous config saved to /var/cache/conftool/dbconfig/20191114-050940-marostegui.json
  • 05:08 vgutierrez: Repooling cp1077 - T238289
  • 05:07 marostegui: Start pre-failover steps T234800
  • 05:01 kart_: Updated cxserver to 2019-11-13-111130-production tag (T237379, T235748, T236906)
  • 04:56 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:51 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:49 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 03:49 vgutierrez: power cycling cp1077 - T238289
  • 03:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 03:49 vgutierrez: depooling cp1077 - T238289
  • 00:41 ebernhardson: T237849 Start CirrusSearch forceSearchIndex.php commonswiki 2019-10-20T00:00:00 - 2019-11-14T01:00:00 pushing into jobqueue
  • 00:40 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 49s)
  • 00:39 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:39 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 44s)
  • 00:38 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:36 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php: T237849: Restore CirrusSearchBuildDocumentParse hook (duration: 00m 54s)

2019-11-13

  • 23:00 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:58 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:25 catrope@deploy1001: Finished scap: For some reason that limited i18n sync didn't work, trying a full scap (duration: 18m 33s)
  • 22:07 catrope@deploy1001: Started scap: For some reason that limited i18n sync didn't work, trying a full scap
  • 22:04 catrope@deploy1001: scap sync-l10n completed (1.35.0-wmf.5) (duration: 02m 54s)
  • 22:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Update to master (b937dce) (duration: 00m 54s)
  • 20:17 XioNoX: delete unused asw2-esams:ae1
  • 19:37 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (again) (duration: 00m 52s)
  • 18:49 Jeff_Green: authdns-update to remove host alnilam
  • 17:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (duration: 00m 53s)
  • 16:41 gehel: depool wdqs1005 - T238232
  • 16:36 gehel: restart blazegraph on wdqs1005
  • 16:21 ema: pool cp3054 with ATS backend T227432
  • 16:21 gehel: draining elastic1017-1031 to prepare for decommission - T230746
  • 16:02 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P9621 and previous config saved to /var/cache/conftool/dbconfig/20191113-155134-marostegui.json
  • 15:39 moritzm: powercycle cloudbackup2002
  • 15:35 ema: depool cp3054 and reimage as text_ats T227432
  • 15:32 moritzm: rebooting cloudbackup2002
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:29 jynus: shutdown db2072 T237905
  • 15:29 gehel: configuration of new elasticsearch servers completed, all working and pooled - T230746
  • 14:55 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9620 and previous config saved to /var/cache/conftool/dbconfig/20191113-145541-jynus.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P9619 and previous config saved to /var/cache/conftool/dbconfig/20191113-134938-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9618 and previous config saved to /var/cache/conftool/dbconfig/20191113-134625-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9617 and previous config saved to /var/cache/conftool/dbconfig/20191113-133410-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for upgrade', diff saved to https://phabricator.wikimedia.org/P9616 and previous config saved to /var/cache/conftool/dbconfig/20191113-132216-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P9615 and previous config saved to /var/cache/conftool/dbconfig/20191113-131530-marostegui.json
  • 11:56 effie: Upgrade to php 7.2.24-1 mediawiki eqiad hosts and restart php-fpm - T237239
  • 11:55 ema: cp-ats: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:46 moritzm: rebooting cloudcontrol2001-dev for microcode debugging
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 moritzm: rebooting labtestpuppetmaster2001 for microcode debugging
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:27 ema: cp-ats-ulsfo: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:27 moritzm: rebooting cloudcontrol2003-dev for some microcode debugging
  • 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:24 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9614 and previous config saved to /var/cache/conftool/dbconfig/20191113-110802-marostegui.json
  • 11:05 Urbanecm: EU SWAT done
  • 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ffwiki* (T238191)
  • 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 0a90ef9: Update localized logos for the Fula Wikipedia (T238191) (duration: 00m 54s)
  • 10:53 vgutierrez: Testing ats-tls-restart on cp5007 - T237425
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9613 and previous config saved to /var/cache/conftool/dbconfig/20191113-104326-marostegui.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9612 and previous config saved to /var/cache/conftool/dbconfig/20191113-103225-marostegui.json
  • 10:27 gehel: start configuration of new elasticsearch servers - T230746
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9610 and previous config saved to /var/cache/conftool/dbconfig/20191113-102054-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9609 and previous config saved to /var/cache/conftool/dbconfig/20191113-101127-marostegui.json
  • 09:51 jynus: upgraded wmf-mariadb101-client on cumin hosts
  • 09:50 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:43 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:41 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 09:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374 (duration: 11m 19s)
  • 09:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374
  • 09:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki (duration: 02m 35s)
  • 09:06 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki
  • 08:25 marostegui: Stop MySQL on db2062 to copy its data to db2132 T238183
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:09 marostegui: Fix replication on labsdb1010 - T233986
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P9607 and previous config saved to /var/cache/conftool/dbconfig/20191113-070339-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 for compression', diff saved to https://phabricator.wikimedia.org/P9606 and previous config saved to /var/cache/conftool/dbconfig/20191113-070055-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9605 and previous config saved to /var/cache/conftool/dbconfig/20191113-065952-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P9604 and previous config saved to /var/cache/conftool/dbconfig/20191113-065823-marostegui.json
  • 06:25 volker-e@deploy1001: Finished deploy [design/style-guide@edce4cc]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:25 volker-e@deploy1001: Started deploy [design/style-guide@edce4cc]: Deploy design/style-guide:
  • 01:35 eileen: civicrm revision changed from 3c15db25bb to a3714003ff, config revision is d678dbcaa5

2019-11-12

  • 23:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix: Do not return after inserting a single suggestion (duration: 00m 52s)
  • 23:51 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/src/mediawiki.interface.helpers.styles.less: Remove extraneous semicolons (T233649), part 2 (duration: 00m 52s)
  • 23:49 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes/changes/ChangesList.php: Remove extraneous semicolons (T233649), part 1 (duration: 00m 53s)
  • 23:49 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:45 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:22 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:20 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 bblack: repool cp1076 (experiments concluded)
  • 22:35 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: enabling REST API (duration: 00m 52s)
  • 22:34 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: enabling REST API (duration: 00m 52s)
  • 22:32 eileen: civicrm revision changed from bfa53ee611 to 3c15db25bb, config revision is d678dbcaa5
  • 21:54 bblack: depooling cp1076 for some local experimentation
  • 20:18 herron: reprepro copy buster-wikimedia stretch-wikimedia prometheus-elasticsearch-exporter
  • 20:11 otto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:11 otto@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:46 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P7007 --new-data-type external-id (T234221)
  • 19:45 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P4839 --new-data-type external-id (T234221)
  • 19:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Sync a previously undeployed change to InitialiseSettings-labs.php that someone forgot to deploy (as a no-op) in production (duration: 00m 52s)
  • 19:41 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group0 for T198312 (duration: 00m 52s)
  • 19:19 arlolra: Updated Parsoid to 6a0a708 (T215000, T235295, T235656, T235217, T235295, T236846, T237556, T235231)
  • 19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708 (duration: 10m 09s)
  • 18:58 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Final fixes and tweaks for testing (duration: 00m 53s)
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708
  • 18:39 ejegg: re-enabled Omnimail and contact de-duplication jobs
  • 18:20 Urbanecm: Morning SWAT done
  • 18:18 Urbanecm: Deploy security patch for T237887
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 130ef87: Add right "abusefilter-log-private" to usergroup "rollbacker" at ptwiki (T237830) (duration: 00m 53s)
  • 18:08 XioNoX: push pfw change to add recdns anycast IP
  • 17:33 XioNoX: update fasw-c-eqiad to match current standard (ntp/users/rootpw/lldp)
  • 17:22 XioNoX: update fasw-c-codfw to match current standard (ntp/users/rootpw/lldp)
  • 17:03 ema: pool cp3052 with ATS backend T238085
  • 17:03 ema: pool cp3052 with ATS backend T227432
  • 16:53 bblack: cpNNNN (all cache nodes) - cumin manual removal of globalsign-2018 remnants (key, cert, ocsp config, ocsp output)
  • 16:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 XioNoX: setup bgp session from cr2-codfw to multihop RIS collector - T106056
  • 16:21 XioNoX: reboot scs-c1-eqiad.mgmt.eqiad.wmnet - T238036
  • 16:09 ema: depool cp3052 and observe performance impact T238085 before reimaging as text_ats T227432
  • 15:49 marostegui: Deploy schema change on db1102:3315 T233135 T234066
  • 15:45 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fixes and tweaks for initial rollout (duration: 00m 53s)
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for a schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9600 and previous config saved to /var/cache/conftool/dbconfig/20191112-154127-marostegui.json
  • 15:24 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=schema
  • 14:46 bblack: cpNNNN (all caches): remove stale outputs from transient ocsp failures ( /var/cache/ocsp/update-ocsp-*.tmp )
  • 14:41 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 14:38 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4021.ulsfo.wmnet,service=nginx
  • 14:35 ema: cp4021: ats-tls-restart to see if https://gerrit.wikimedia.org/r/550475 fixed the script
  • 14:16 Jeff_Green: authdns-update to deploy fundraising-read.wmnet service cname adjustment
  • 14:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set all of wikidata for write both for term store" (duration: 00m 52s)
  • 12:57 godog: refresh kibana field list
  • 12:46 gehel: repool wdqs1004
  • 12:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 100 (T237984)
  • 12:19 onimisionipe: restarting blazegraph on wdqs1005
  • 12:11 effie: Reimage mwdebug1002 - T214734
  • 11:47 Amir1: EU SWAT is done
  • 11:47 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase term store error reduction, Do not catch DBError in ReplicaMasterAwareRecordIdsAcquirer. (T236466) (duration: 00m 56s)
  • 11:44 effie: Upgrade wtp* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata for write both for term store (T225055) (duration: 00m 52s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SECURITY: Dont allow Wikimedia sysops to see who had 2FA disabled (duration: 00m 53s)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9599 and previous config saved to /var/cache/conftool/dbconfig/20191112-104400-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9598 and previous config saved to /var/cache/conftool/dbconfig/20191112-103641-marostegui.json
  • 10:35 onimisionipe: resetting cronfile on wdqs hosts
  • 10:33 marostegui: Drop labtestwiki database from m5 master db1133 - T236010
  • 10:30 marostegui: Deploy schema change on dbstore1003:3315
  • 10:07 ema: repool cp3065, nothing interesting in kern.log and SEL T238032
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9596 and previous config saved to /var/cache/conftool/dbconfig/20191112-095221-marostegui.json
  • 09:42 marostegui: Remove privileges for labtestwiki on m5 - T236010
  • 09:27 gehel: restarting blazegraph on wdqs1004
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083', diff saved to https://phabricator.wikimedia.org/P9595 and previous config saved to /var/cache/conftool/dbconfig/20191112-091706-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for mariadb upgrade to 10.1.39 - T234800', diff saved to https://phabricator.wikimedia.org/P9594 and previous config saved to /var/cache/conftool/dbconfig/20191112-091158-marostegui.json
  • 09:11 marostegui: Upgrade mariadb to 10.1.39 on db1083 (candidate master for s1)
  • 08:56 moritzm: restarting archiva to pick up Java security updates
  • 08:44 volker-e@deploy1001: Finished deploy [design/style-guide@3de6820]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:44 volker-e@deploy1001: Started deploy [design/style-guide@3de6820]: Deploy design/style-guide:
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9593 and previous config saved to /var/cache/conftool/dbconfig/20191112-083720-marostegui.json
  • 08:37 gehel: depool wdqs1004 to investigate update lag
  • 08:35 moritzm: installing poppler security updates
  • 08:24 volker-e@deploy1001: Finished deploy [design/style-guide@b926b95]: Deploy design/style-guide: (duration: 00m 07s)
  • 08:24 volker-e@deploy1001: Started deploy [design/style-guide@b926b95]: Deploy design/style-guide:
  • 08:15 moritzm: installing curl security updates
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9592 and previous config saved to /var/cache/conftool/dbconfig/20191112-081322-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9591 and previous config saved to /var/cache/conftool/dbconfig/20191112-074006-marostegui.json
  • 07:36 elukey: remove /etc/logrotate.d/wdqs_autodeployment_log from wdqs1009 (not in puppet anymore and causing cronspam)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9590 and previous config saved to /var/cache/conftool/dbconfig/20191112-072823-marostegui.json
  • 07:10 marostegui: Upgrade kernel on db1083 (s1 candidate master)
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade - T234800', diff saved to https://phabricator.wikimedia.org/P9589 and previous config saved to /var/cache/conftool/dbconfig/20191112-070436-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:44 marostegui: Change triggers on s5 db2094 - T234704
  • 06:40 marostegui: Deploy schema change on s5 codfw with replication, this will generate lag on s5 codfw T233135 T234066
  • 06:21 marostegui: Compress db2087:3316, db2087:3317 T235599
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for compression - T235599', diff saved to https://phabricator.wikimedia.org/P9588 and previous config saved to /var/cache/conftool/dbconfig/20191112-061959-marostegui.json
  • 03:41 vgutierrez: restart wdqs-blazegraph on wdqs1004

2019-11-11

  • 22:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 22:49 ema: power-cycle cp3065, currently down
  • 19:36 XioNoX: disable ALGs on mr1-esams
  • 18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 00m 57s)
  • 18:19 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 15m 14s)
  • 18:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 17:44 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:44 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:42 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 14:26 ema: pool cp3050 with ATS backend T227432
  • 13:50 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:25 ema: depool cp3050 and reimage as text_ats T227432
  • 12:59 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 12:46 effie: Upgrade to 7.2.24-1 mwdebug[2001-2002].codfw.wmnet,mwmaint2001.codfw.wmnet,deploy2001.codfw.wmnet - T237239
  • 12:31 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010 (duration: 00m 28s)
  • 12:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010
  • 12:28 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 12:21 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T231881
  • 11:55 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:52 hoo: Updated the Wikidata property suggester with data from the 2019-11-04 JSON dump and applied the T132839 workarounds
  • 10:48 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:47 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:32 vgutierrez: restarting ats-tls on cp1088
  • 10:21 jynus: upgrade mariadb on db2102
  • 10:16 ema: repool cp4027 after successful X-Wikimedia-Debug testing P9585 T237687
  • 10:12 jynus: manually run full backup of labtestpuppetmaster2001 T235819
  • 09:41 ema: test x-wikimedia-debug-routing.lua on cp4027 (depooled) T237687
  • 09:09 volker-e@deploy1001: Finished deploy [design/style-guide@0ea65f2]: Deploy design/style-guide: (duration: 00m 07s)
  • 09:09 volker-e@deploy1001: Started deploy [design/style-guide@0ea65f2]: Deploy design/style-guide:
  • 08:28 marostegui: Stop MySQL on db2048 before decommissioning - T237913
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2048 from config T237913 (duration: 00m 51s)
  • 08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2048 from config T237913 (duration: 00m 54s)
  • 08:21 marostegui: Remove db2048 from tendril and zarcillo T237913
  • 06:56 elukey: delete /etc/logrotate.d/wdqs-reload-categories from wdqs* as attempt to reduce cronspam
  • 06:44 marostegui: Delete globalblocks table from napwikisource T230055
  • 05:27 vgutierrez: Switch from nginx to ats-tls on cp3058 - T231627

2019-11-09

  • 20:25 reedy@deploy1001: Synchronized langlist-labs: T237823 (duration: 00m 54s)
  • 02:39 volker-e@deploy1001: Finished deploy [design/style-guide@d2bfc09]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:39 volker-e@deploy1001: Started deploy [design/style-guide@d2bfc09]: Deploy design/style-guide:
  • 01:07 volker-e@deploy1001: Finished deploy [design/style-guide@ef82b69]: Deploy design/style-guide: (duration: 00m 07s)
  • 01:07 volker-e@deploy1001: Started deploy [design/style-guide@ef82b69]: Deploy design/style-guide:
  • 01:06 volker-e@deploy1001: Finished deploy [design/style-guide@97fb3ee]: Deploy design/style-guide: (duration: 00m 09s)
  • 01:06 volker-e@deploy1001: Started deploy [design/style-guide@97fb3ee]: Deploy design/style-guide:

2019-11-08

  • 20:26 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation request jobs by 5 mins for testing (duration: 00m 52s)
  • 16:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "MachineVision: Enable testers-only mode on testcommonswiki for debugging" (duration: 00m 54s)
  • 15:57 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118, db1106 at 100%', diff saved to https://phabricator.wikimedia.org/P9582 and previous config saved to /var/cache/conftool/dbconfig/20191108-155700-jynus.json
  • 15:37 herron: beginning rolling service restarts on logstash hosts for java security updates
  • 15:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enable testers-only mode on testcommonswiki for debugging (duration: 00m 52s)
  • 14:56 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:55 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9581 and previous config saved to /var/cache/conftool/dbconfig/20191108-145028-jynus.json
  • 14:42 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: stop and upgrade percona-server on test host db1114
  • 13:27 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:12 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9580 and previous config saved to /var/cache/conftool/dbconfig/20191108-131257-jynus.json
  • 13:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee2027c: Change the language of Votewiki back to English (en) (T230614) (duration: 00m 54s)
  • 12:34 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:14 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 10%', diff saved to https://phabricator.wikimedia.org/P9578 and previous config saved to /var/cache/conftool/dbconfig/20191108-121444-jynus.json
  • 12:02 jynus: update and restart db1118
  • 12:01 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1118 fully', diff saved to https://phabricator.wikimedia.org/P9577 and previous config saved to /var/cache/conftool/dbconfig/20191108-120138-jynus.json
  • 11:55 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9576 and previous config saved to /var/cache/conftool/dbconfig/20191108-115553-jynus.json
  • 11:27 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9575 and previous config saved to /var/cache/conftool/dbconfig/20191108-112733-jynus.json
  • 11:25 jynus@cumin1001: dbctl commit (dc=all): 'repool db2130', diff saved to https://phabricator.wikimedia.org/P9574 and previous config saved to /var/cache/conftool/dbconfig/20191108-112503-jynus.json
  • 11:12 jynus: update and restart db2130
  • 11:11 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2116, depool db2130', diff saved to https://phabricator.wikimedia.org/P9573 and previous config saved to /var/cache/conftool/dbconfig/20191108-111125-jynus.json
  • 10:58 Amir1: running rebuildItemTerms on 8028 items (T234329)
  • 10:51 jynus: update and restart db2116
  • 10:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2103, depool db2116', diff saved to https://phabricator.wikimedia.org/P9572 and previous config saved to /var/cache/conftool/dbconfig/20191108-105013-jynus.json
  • 10:38 jynus: update and restart db2103
  • 10:34 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephmon[1-3] T228102
  • 10:33 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephosd[1-3] T224188
  • 10:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2092, depool db2103', diff saved to https://phabricator.wikimedia.org/P9571 and previous config saved to /var/cache/conftool/dbconfig/20191108-103218-jynus.json
  • 10:19 jynus: update and restart db2092
  • 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2071, depool db2092', diff saved to https://phabricator.wikimedia.org/P9570 and previous config saved to /var/cache/conftool/dbconfig/20191108-101759-jynus.json
  • 10:09 elukey: restart jvm-based hadoop daemons on an-master100[1,2] to pick up the new openjdk version
  • 10:06 jynus: update and restart db2071
  • 10:03 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P9569 and previous config saved to /var/cache/conftool/dbconfig/20191108-100310-jynus.json
  • 10:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2072', diff saved to https://phabricator.wikimedia.org/P9568 and previous config saved to /var/cache/conftool/dbconfig/20191108-100128-jynus.json
  • 09:50 moritzm: uploaded openjdk 8u232-b09-1~deb10u1 to component/jdk8 for apt.wikimedia.org/buster-wikimedia
  • 09:41 jynus: update and restart db2072
  • 09:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9567 and previous config saved to /var/cache/conftool/dbconfig/20191108-094100-jynus.json
  • 09:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9566 and previous config saved to /var/cache/conftool/dbconfig/20191108-093958-jynus.json
  • 09:35 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 09:29 jynus: update and restart db2094
  • 09:27 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9565 and previous config saved to /var/cache/conftool/dbconfig/20191108-092735-jynus.json
  • 09:10 jynus: update and restart db1106
  • 09:08 moritzm: installing Java security updates on kafka-jumbo
  • 09:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 fully', diff saved to https://phabricator.wikimedia.org/P9564 and previous config saved to /var/cache/conftool/dbconfig/20191108-090746-jynus.json
  • 09:05 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 09:04 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9563 and previous config saved to /var/cache/conftool/dbconfig/20191108-090451-jynus.json
  • 09:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9562 and previous config saved to /var/cache/conftool/dbconfig/20191108-090012-jynus.json
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:52 jynus: stop and upgrade db1124 (may create temporary lag on wikireplicas)
  • 08:31 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:23 elukey: restart kafka on kafka-jumbo1001 to test the new openjdk
  • 08:07 moritzm: installing fribidi security updates on Buster
  • 03:03 vgutierrez: Switch from nginx to ats-tls on cp3054 - T231627
  • 02:42 vgutierrez: Switch from nginx to ats-tls on cp3052 - T231627
  • 01:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GlobalBlocking/: Prevent some extra db queries (duration: 00m 53s)
  • 01:14 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Use internationalized semicolon separators (T233649) (duration: 00m 53s)
  • 01:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic (duration: 03m 04s)
  • 01:06 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic
  • 00:44 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logging.js: Fix homepage instrumentation (T237600) (duration: 00m 52s)
  • 00:40 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes: Sync DiffEngine changes that were needed to unbreak CI (duration: 00m 55s)
  • 00:34 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Semicolon should appear after log entries (T237500) (duration: 00m 53s)
  • 00:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix remote API configs for GrowthExperiments (duration: 00m 51s)
  • 00:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable suggested edits as hidden preference on arwiki, cswiki, kowiki, viwiki (T236968) (duration: 00m 53s)

2019-11-07

  • 23:49 foks: removing one file for legal compliance
  • 23:47 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert phatalaty again (duration: 03m 04s)
  • 23:44 shdubsh: start elasticsearch on logstash1008
  • 23:44 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert phatalaty again
  • 23:41 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: one more time (duration: 03m 00s)
  • 23:38 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: one more time
  • 23:31 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout (duration: 03m 02s)
  • 23:28 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout
  • 23:23 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert to previous phatality plugin version (duration: 02m 55s)
  • 23:20 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert to previous phatality plugin version
  • 23:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 00m 06s)
  • 23:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 23:04 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 06m 48s)
  • 23:00 XenoRyet: updated payments-wiki from aac3d93f70 to bd907656fb
  • 22:57 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 22:53 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes (duration: 00m 05s)
  • 22:53 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes
  • 22:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Remove annotation job delay (duration: 00m 53s)
  • 22:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions (duration: 00m 06s)
  • 22:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions
  • 21:54 andrewbogott: rebuilding labtestpuppetmaster2001 w/Stretch
  • 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
  • 21:28 mutante: boron apt-get clean (saved 9G on /) (T237649)
  • 20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.5 refs T233853
  • 20:24 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ArticleTarget.js: Fix error handling (duration: 01m 00s)
  • 20:21 herron: performing rolling reboots of kafka-main hosts for security updates
  • 20:17 onimisionipe: cluster restart for cloudelastic to pick JVM upgrade
  • 20:08 eileen: civicrm revision changed from f1ce5c86f7 to bfa53ee611, config revision is 72d2692743
  • 19:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enqueue annotation job on upload complete (duration: 05m 19s)
  • 18:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Disable retrying annotation requests (duration: 05m 17s)
  • 18:25 ebernhardson: restart mjolnir-kafka-bulk-daemon and mjolnir-kafka-msearch-daemon across `cirrus` dsh group
  • 18:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration (duration: 05m 49s)
  • 18:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration
  • 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:38 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:30 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:25 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Drop currently unsupported external dependencies (T227349) (duration: 05m 19s)
  • 17:10 XioNoX: Homer push - forwarding-options - to all cr
  • 17:09 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:08 XioNoX: add sampling stanza (disabled) to cr2-esams
  • 17:00 mutante: wtp2020 - 2 hours downtime - shut down (T205712) - go ahead @papaul
  • 17:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 16:58 mutante: wtp2020 - depooled for T205712
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp2020.codfw.wmnet
  • 16:42 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: some alphasorted config (duration: 01m 00s)
  • 16:34 XioNoX: Homer push on cr2-knams: Sampling (disabled), enhanced-hash-key, ospf interfaces re-ordering (noop), policy-statement BGP_from_LVS (unused), lo0 term allow_vmhost
  • 16:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 100%', diff saved to https://phabricator.wikimedia.org/P9553 and previous config saved to /var/cache/conftool/dbconfig/20191107-163235-jynus.json
  • 16:20 XioNoX: add BGP sessions to AS64050 in eqiad
  • 16:15 XioNoX: add BGP sessions to AS57695 in esams and eqiad
  • 16:12 XioNoX: clear v4 BGP sessions to AS7713 in eqsin (hit max prefix limit)
  • 16:02 mutante: mw2225 restart cron (T236799)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta logging (duration: 01m 00s)
  • 15:41 XioNoX: remove BGP to AS3491 on eqiad (left the IX)
  • 15:40 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:53 jbond42: rebuilding compiler1001
  • 13:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 50%', diff saved to https://phabricator.wikimedia.org/P9551 and previous config saved to /var/cache/conftool/dbconfig/20191107-135018-jynus.json
  • 12:47 Urbanecm: EU SWAT done
  • 12:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 8e71601: a36ed85: GrowthExperiments: Configure testwiki for suggested edits testing + follow up patch (T237634) (duration: 00m 59s)
  • 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 19034af: GrowthExperiments: Configure intro links for suggested edits (T235723) (duration: 01m 00s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2be3f86: [cirrus] remove cross_cluster_single_shard_search quirk (duration: 01m 02s)
  • 12:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5253dec: Give commonswiki filemovers `suppressredirect` rights (T236348) (duration: 01m 03s)
  • 11:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 fully (duration: 01m 01s)
  • 11:54 jbond42: update puppet_version used by CI 545289
  • 11:50 jbond42: rebuilding compiler1002
  • 11:36 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 10%', diff saved to https://phabricator.wikimedia.org/P9550 and previous config saved to /var/cache/conftool/dbconfig/20191107-113611-jynus.json
  • 11:16 jynus: stop and upgrade db1080
  • 10:58 moritzm: installing Java security updates on kafka-main/logstash
  • 10:50 moritzm: installing Java security updates on wdqs/maps
  • 10:46 jynus@cumin1001: dbctl commit (dc=all): 'Fully depool db1080', diff saved to https://phabricator.wikimedia.org/P9549 and previous config saved to /var/cache/conftool/dbconfig/20191107-104618-jynus.json
  • 10:28 moritzm: upgrading mw1277-1279 servers to PHP 7.2.24 T237239
  • 10:27 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1080 weight', diff saved to https://phabricator.wikimedia.org/P9548 and previous config saved to /var/cache/conftool/dbconfig/20191107-102747-jynus.json
  • 09:41 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 with low weight (duration: 01m 02s)
  • 09:30 jynus: stop and upgrade es1016
  • 09:18 moritzm: installing Java security updates on aqs/druid/Hadoop
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1016 (duration: 01m 04s)
  • 09:03 jynus: stop and upgrade es2012, es2014
  • 08:48 jynus: stop and upgrade es2011
  • 08:30 jynus: upgrade and restart db2093
  • 00:21 XioNoX: enable interface damping on primary eqsin-codfw link - T236878
  • 00:09 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/549227 (duration: 01m 00s)
  • 00:00 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 04m 29s)

2019-11-06

  • 23:56 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 23:55 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 14m 56s)
  • 23:40 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 22:36 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on commonswiki (T227349)
  • 22:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 22:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on commonswiki (T227349) (duration: 01m 00s)
  • 22:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation jobs on commonswiki only (duration: 01m 01s)
  • 22:17 mdholloway: created MachineVision extension tables on commonswiki
  • 22:13 XioNoX: push standard forwarding-options to cr3/4-ulsfo
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 22:04 mholloway-shell@deploy1001: Synchronized private/PrivateSettings.php: Configure Google Cloud Vision API credentials (2/2) (T236426) (duration: 00m 59s)
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1247.eqiad.wmnet
  • 22:03 mholloway-shell@deploy1001: Synchronized private/GoogleCloudVision.php: Configure Google Cloud Vision API credentials (1/2) (T236426) (duration: 00m 59s)
  • 21:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Allow specifying API credentials as an associative array (T236426) (duration: 01m 01s)
  • 21:53 thcipriani: checkout /srv/mediawiki-staging/php-1.35.0-wmf.5/maintenance/Maintenance.php looks like a local change for debugging left behind
  • 21:47 arlolra: Updated Parsoid to 1d283ed (T237104, T227209, T236865)
  • 21:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed (duration: 10m 22s)
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1247.eqiad.wmnet
  • 21:14 XioNoX: push standard forwarding-options to cr3-esams
  • 21:12 milimetric@deploy1001: Finished deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns (duration: 10m 52s)
  • 21:01 milimetric@deploy1001: Started deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns
  • 20:36 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/OpenStackManager/: sync openstackmanager to deploy https://gerrit.wikimedia.org/r/#/q/I5b08f0069941052acdd9f05a62aac5b2cf9ecdd5 (duration: 01m 00s)
  • 20:34 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.5 refs T233853 (duration: 01m 00s)
  • 20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.5 refs T233853
  • 19:05 mutante: mw1225 - re-enabling puppet (no reason given, nothing in SAL or Phab but disabled)
  • 18:43 mutante: LDAP - add dwisehaupt to wmf group (T235676)
  • 18:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix typo (T222117) (duration: 01m 00s)
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Instrument logging to ClosedWikiProvider (T222117) (duration: 01m 01s)
  • 17:22 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1126 weight, too much backlog', diff saved to https://phabricator.wikimedia.org/P9542 and previous config saved to /var/cache/conftool/dbconfig/20191106-172235-jynus.json
  • 17:21 ejegg: turned off donation queue consumer for financial_trxn record fix
  • 17:17 ejegg: updated Fundraising CiviCRM from 1c3be265ae to f1ce5c86f7
  • 17:15 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 fully (duration: 00m 59s)
  • 17:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WebAuthn extension if wmgUseWebAuthn is set (false in all of production) T227242 (duration: 01m 00s)
  • 17:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseWebAuthn false in all of production T227242 (duration: 01m 01s)
  • 17:08 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 fully', diff saved to https://phabricator.wikimedia.org/P9541 and previous config saved to /var/cache/conftool/dbconfig/20191106-170852-jynus.json
  • 16:11 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on testcommonswiki (T227349)
  • 15:58 mdholloway: created MachineVision tables on testcommonswiki (T227349)
  • 15:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure MachineVision and enable on testcommonswiki (T227349) (duration: 01m 00s)
  • 15:47 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: MachineVision: Use an HTTP proxy in production (T236843) (duration: 01m 01s)
  • 15:42 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Do not restrict to testing users on Beta (duration: 01m 00s)
  • 15:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Fix Beta config with updated service name (duration: 01m 02s)
  • 14:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 with low weight (duration: 00m 59s)
  • 14:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable streaks and revert counts (T234955, T234956) (duration: 01m 00s)
  • 14:27 jynus: upgrade and restart es1019
  • 14:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 01m 00s)
  • 14:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 50%', diff saved to https://phabricator.wikimedia.org/P9539 and previous config saved to /var/cache/conftool/dbconfig/20191106-140702-jynus.json
  • 12:38 Urbanecm: EU SWAT done
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 2/2) (duration: 01m 00s)
  • 12:36 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 1/2) (duration: 00m 59s)
  • 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3e9ede0: Add 104 (Cookbook) to $wgContentNamespaces for bnwikibooks (T236840) (duration: 01m 00s)
  • 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5875c45: [cirrus] Disable instant indexing on wikidata (duration: 01m 15s)
  • 11:57 jynus: upgrade and restart db2048
  • 11:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 10%', diff saved to https://phabricator.wikimedia.org/P9537 and previous config saved to /var/cache/conftool/dbconfig/20191106-113510-jynus.json
  • 11:14 jynus: stopping db1074 for maintenance (will create temporary s2 lag on wikireplicas)
  • 11:06 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P9536 and previous config saved to /var/cache/conftool/dbconfig/20191106-110603-jynus.json
  • 09:46 moritzm: upgrading mw1262-mw1265,mw1276 servers to PHP 7.2.24 T237239
  • 09:33 jynus: stop and upgrade labsdb1011 T236015
  • 09:25 jynus: depooling labsdb1011 for wikireplica service T236015
  • 09:10 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 08:58 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 08:51 jynus: upgrading wmf-mariadb101-client on cumin hosts
  • 08:51 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.24 T237239
  • 08:33 jynus: upgrading db2102 mariadb (test-s1)
  • 07:48 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 07:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 02:59 vgutierrez: Switch from nginx to ats-tls on cp5012 - T231627
  • 00:07 mdholloway: created table wikimedia_editor_tasks_edit_streak on x1/wikishared (T234956)

2019-11-05

  • 23:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.5 refs T233853
  • 23:25 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.5 refs T233853 (duration: 24m 13s)
  • 23:01 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:51 twentyafterfour@deploy1001: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2905573311"/* "/srv/mediawiki-staging/php-1.35.0-wmf.5/cache/l10n"' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:50 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:39 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2076118383" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:38 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:17 twentyafterfour: scap failed with error: A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. refs T233853
  • 22:09 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_840646293" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 04m 54s)
  • 22:04 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:03 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr2-esams:lo0.0
  • 21:58 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr3-esams:lo0.0
  • 20:45 mutante: shutting down cobalt (formerly gerrit server)
  • 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:33 XioNoX: push fw policies to pfw3-eqiad - T236201
  • 20:23 XioNoX: push fw policies to pfw3-codfw - T236201
  • 20:17 joal@deploy1001: Finished deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch (duration: 08m 21s)
  • 20:09 joal@deploy1001: Started deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade (duration: 08m 49s)
  • 20:00 joal@deploy1001: Started deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade
  • 18:40 twentyafterfour: MediaWiki train: start branching wmf/1.35.0-wmf.5
  • 18:30 XioNoX: fix typo on cr1-eqsin:lo0.0 v6 IP
  • 18:27 ejegg: updated payments-wiki from 0de9d96208 to aac3d93f70
  • 17:21 jynus: restarting etherpad
  • 16:56 arturo: deleted stretch-wikimedia/thirdparty/kubeadm-k8s and created buster-wikimedia/thirdparty/kubeadm-k8s
  • 16:24 papaul: Replacing disk on db2120
  • 15:37 jynus: deploying schema change on x1 T234955
  • 15:20 ema: cp4027: upgrade trafficserver to 8.0.5-1wm10
  • 14:37 jynus: reducing consistency temporarilly on db1114 so it can catch up replication
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 ema: pool cp5012 with ATS backend T227432
  • 10:45 vgutierrez: restarting atsmtail@backend on cp5006
  • 09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 ema: wb2-phab stopped saying things a while ago. Restarted
  • 09:18 jynus: restart dbprov100[12] T236924
  • 09:11 jynus: restart dbprov2001 T236924
  • 08:12 vgutierrez: uploaded fifo-log-demux 0.6 to apt.wikimedia.org (stretch)
  • 08:02 jynus: redact mnwwiki on db1124 and db2094 T235743
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp5011 - T231627
  • 04:13 vgutierrez: Switch from nginx to ats-tls on cp5010 - T231627
  • 03:51 vgutierrez: pooling cp3057 - T237348
  • 03:46 mutante: wdqs1004 restarting wdqs-blazegraph
  • 03:01 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 02:59 vgutierrez: depool cp3057 - T237348
  • 00:15 mutante: gerrit - restarting service to re-enable jgit gc (T217497)
  • 00:13 mutante: gerrit2001 - restart gerrit (replica)

2019-11-04

  • 23:18 milimetric@deploy1001: Finished deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs (duration: 07m 20s)
  • 23:11 milimetric@deploy1001: Started deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs
  • 23:05 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:03 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:08 bd808: The Wikimedia SAL Twitter feed is now @wikimedia_sal (https://twitter.com/wikimedia_sal) T237322
  • 20:51 bd808: Testing twitter feed following account confirmation
  • 19:23 Urbanecm: Morning SWAT done
  • 19:17 mutante: cobalt - stopping services, removing apache2
  • 19:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6a4b966: Add throttle rule for bard college editathon (T236955) (duration: 00m 54s)
  • 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9204768: Enable DNS blacklist for es.wikinews (T237151) (duration: 00m 53s)
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0fc3909: Allow FlaggedRevs autoreview permission to be assigned globally (duration: 00m 54s)
  • 18:30 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode (duration: 03m 27s)
  • 18:26 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode
  • 18:24 ppchelko@deploy1001: Finished deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902 (duration: 14m 30s)
  • 18:17 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts (duration: 12m 07s)
  • 18:09 ppchelko@deploy1001: Started deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902
  • 18:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts
  • 17:41 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Update for YAML-reading (offline) (duration: 00m 52s)
  • 17:39 jforrester@deploy1001: Synchronized wmf-config/config/: Sync out YAML config files (duration: 00m 56s)
  • 15:43 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable revert counts on beta (T234955) (duration: 00m 53s)
  • 15:36 jynus: running failing check_private_data report on labsdb1009 T235743
  • 15:33 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 00m 59s)
  • 15:01 joal@deploy1001: Started restart [analytics/aqs/deploy@59a97fa]: (no justification provided)
  • 14:36 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:36 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 ema: upload trafficserver 8.0.5-1wm10 to stretch-wikimedia
  • 13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:47 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 elukey: update bacula terms on analytics-in{4,6} filters on cr{1,2}-eqiad - T237016
  • 13:28 jbond42: update production puppetmasters to use new puppetdb servers
  • 13:20 Amir1: Creating Mon Wikipedia is done T235739
  • 13:19 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 13:16 ladsgroup@deploy1001: Synchronized langlist: T235739 (duration: 00m 52s)
  • 13:15 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T235739 (duration: 00m 53s)
  • 13:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T235739 (duration: 00m 53s)
  • 13:13 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T235739 (duration: 00m 52s)
  • 13:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T235739
  • 13:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 53s)
  • 13:06 ema: depool cp5012 and reimage as text_ats T227432
  • 12:21 Urbanecm: EU SWAT done
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 2/2) (duration: 00m 52s)
  • 12:12 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki* (T236905)
  • 12:11 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 1/2) (duration: 00m 53s)
  • 12:08 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: a6d64b1: Update logo for zh-classical Wikipedia (T236905) (duration: 00m 53s)
  • 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c92a13c: Enable partial blocks on kowiki (T236752) (duration: 00m 54s)
  • 12:00 moritzm: upgrading mw1261 to PHP 7.2.24 (T237239)
  • 11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 11:08 moritzm: uploaded PHP 7.2.24 to apt.wikimedia.org stretch-wikimedia/component/php72 (T237239)
  • 04:53 vgutierrez: Switch from nginx to ats-tls on cp5009 - T231627
  • 04:39 vgutierrez: Switch from nginx to ats-tls on cp5008 - T231627

2019-11-03

  • 03:54 andrew@deploy1001: Finished deploy [horizon/deploy@0c024d4]: one more prefix fix (duration: 03m 35s)
  • 03:50 andrew@deploy1001: Started deploy [horizon/deploy@0c024d4]: one more prefix fix
  • 03:10 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try) (duration: 00m 25s)
  • 03:10 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try)
  • 03:09 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (duration: 06m 01s)
  • 03:03 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation

2019-11-02

  • 00:58 mutante: gerrit-replica - created missing /var/lib/gerrit2/review_site/tmp and restarted service - service back up on buster (T176774)
  • 00:34 mutante: gerrit-replica - fixing permissions of files in /srv/gerrit and restarting
  • 00:27 mutante: gerrit2001 - copy mysql-connector-java.jar into /usr/share/java/ and link it into /var/lib/gerrit2/review_site/lib (T176774)
  • 00:05 mutante: rsyncing gerrit plugin dir from gerrit1001 to gerrit2001 (T176774)

2019-11-01

  • 23:45 mutante: rsyncing gerrit git data from gerrit1001 to gerrit2001 (using --delete too!) T176774
  • 22:00 mutante: gerrit - repo sync between gerrit and gerrit-replica in progress .. if you can't clone from replica you can use main gerrit and replica will come back
  • 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: T237126 Fixing DOM in upload interface of UploadWizard (duration: 00m 56s)
  • 21:06 mutante: scp /usr/share/java/mysql-connector-java.jar from gerrit1001 to gerrit2001 (T176774)
  • 20:46 cdanis: add to bot_blocked_nets the IPs of several EC2 instances sending expensive requests to ORES T237134
  • 19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mutante: gerrit2001 - reinstalling with buster
  • 19:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration (duration: 00m 11s)
  • 19:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration
  • 16:39 XioNoX: push Add BGP_from_LVS policy and term vmhost to loopback4 filter to CRs
  • 16:37 ema: pool cp5011 with ATS backend T227432
  • 16:16 XioNoX: asw2-a-eqiad# run request system license add terminal
  • 15:39 moritzm: installing libonig security updates
  • 15:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 moritzm: installing libpcap security updates
  • 15:11 moritzm: installing python-ecdsa security updates
  • 14:34 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 ema: depool cp5011 and reimage as text_ats T227432
  • 14:02 moritzm: rebooting kafka-main1004 for microcode tests
  • 14:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:56 moritzm: upgrading mwdebug2002 to PHP 7.2.24 for some smoke tests with the new build
  • 12:18 ema: pool cp5010 with ATS backend T227432
  • 11:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 ema: depool cp5010 and reimage as text_ats T227432
  • 11:08 effie: enable puppet mediawiki and prometheus servers
  • 10:54 effie: remove prometheus-hhvm-exporter package from mw* servers - T229792
  • 10:37 moritzm: installing clamav security updates on mendelevium
  • 10:33 effie: Disable puppet on mediawiki and prometheus servers to remove hhvm exporters - T229792
  • 09:28 moritzm: installing file security updates on jessie
  • 09:21 effie: depool mw1317
  • 09:19 moritzm: installing golang-1.11 security updates
  • 08:57 moritzm: installing ruby-loofah security updates
  • 08:17 moritzm: installing libarchive security updates
  • 01:58 volker-e@deploy1001: Finished deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements (duration: 00m 05s)
  • 01:58 volker-e@deploy1001: Started deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements
  • 01:21 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/resources/src/mediawiki.widgets/mw.widgets.UsersMultiselectWidget.js: T236460 mw.widgets.UsersMultiselectWidget: Fix property name (duration: 00m 54s)

2019-10-31

  • 23:33 Urbanecm: Evening SWAT done
  • 23:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice/extension.json: SWAT: dcd3ec3: Fix error in CentralNoticeImpression schema (T236627) (duration: 00m 51s)
  • 23:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/VisualEditor/: SWAT: 3686b82: Revert "Parse relative hrefs on image nodes like on regular links" (T237040) (duration: 00m 53s)
  • 23:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 02bf4b8: Re-enable mobile editor A/B testing (T236337) (duration: 00m 52s)
  • 23:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki* (T237035)
  • 23:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 54ee973: Change bawiki logo to an anniversary one (T237035) (duration: 00m 53s)
  • 23:04 eileen: civicrm revision changed from d2045c6b98 to 1183915bde, config revision is 1a709a61aa
  • 23:00 mutante: replacing deployment keys for apache2secmod ; re-arming keyholder on deployment server
  • 22:51 XioNoX: Homer push to cr1/2-eqiad
  • 22:17 XioNoX: Homer push to cr1/2-codfw
  • 22:14 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 00m 06s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:12 mutante: vega sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:12 twentyafterfour@deploy1001: deploy aborted: testing deploy_design (duration: 05m 07s)
  • 22:12 mutante: bromine sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:05 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 01m 30s)
  • 22:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 21:59 mutante: deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677)
  • 21:49 mutante: deploy1001 keyholder restart, keyholder arm ...
  • 21:46 mutante: deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677)
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 (duration: 13m 44s)
  • 21:25 robh: setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga
  • 21:18 ppchelko@deploy1001: Started deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902
  • 20:35 XioNoX: Homer push to all cr2-eqdfw - new NTP servers, remove border-in4 term unused-ips, add (unused) BGP_Wikimedia_pops, re-order ospf interfaces
  • 20:27 shdubsh: restarting logstash on logstash1008 to test level->severity filter selector
  • 20:12 XioNoX: Homer push to all msw* - new NTP servers - T237011
  • 20:07 XioNoX: Homer push to all asw* - new NTP servers - T237011
  • 19:49 XioNoX: Homer push to eqsin
  • 19:49 mutante: rsyncing home dirs from previous gerrit server cobalt to gerrit1001
  • 19:36 fdans@deploy1001: Finished deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt (duration: 06m 53s)
  • 19:31 XioNoX: Homer push to ulsfo
  • 19:29 fdans@deploy1001: Started deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt
  • 19:08 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.4
  • 18:22 Urbanecm: Morning SWAT done
  • 18:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice: SWAT: 3e5b33f: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 00m 55s)
  • 18:20 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/CentralNotice: SWAT: 963e963: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 01m 01s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe08fbb: Undeploy reader surveys in English, Polish, and Russian (T232525) (duration: 01m 02s)
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@8ca04df]: deploying refinery (duration: 01m 09s)
  • 18:00 fdans@deploy1001: Started deploy [analytics/refinery@8ca04df]: deploying refinery
  • 16:23 bd808: Our @wikimediatech Twitter account is soft blocked pending phone number verification. bd808 trying to figure out a good way to do that verification for a bot account.
  • 16:14 jynus: restart dbprov2002 after upgrade T236924
  • 16:09 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 100%', diff saved to https://phabricator.wikimedia.org/P9513 and previous config saved to /var/cache/conftool/dbconfig/20191031-160925-jynus.json
  • 15:28 jgleeson: Updated paymentswiki from e28bc54e85 to 0de9d96208
  • 14:56 Urbanecm: Password reset for SUL user `Darth AK`
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119 at 10%', diff saved to https://phabricator.wikimedia.org/P9512 and previous config saved to /var/cache/conftool/dbconfig/20191031-145010-jynus.json
  • 14:28 jynus: reloading ferm on db1119
  • 14:24 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P9511 and previous config saved to /var/cache/conftool/dbconfig/20191031-142455-jynus.json
  • 13:40 effie: upload xdebug 2.7.0-1+wmf2 to component/php72 - T234418
  • 13:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s)
  • 13:16 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json
  • 11:48 jynus: setting pc1008 as a replica of active pc1010
  • 11:43 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s)
  • 11:37 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json
  • 11:24 Urbanecm: EU SWAT done
  • 11:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/ProofreadPage/: SWAT: e0d5ce9: Add page navigation tabs in correct order skin-side and remove js requirement for Vector tab icons (T231250); ed17da2: Makes sure that Vector default background does not override the navigation arrows (T236969) (duration: 01m 02s)
  • 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 547086|Enable ContentTranslation out of Beta in Albanian WP (T236064) (duration: 01m 02s)
  • 11:03 ema: cp5008: restart ats-be to clear "backend process restarted" alert
  • 11:00 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 godog: bounce logstash on logstash2004
  • 10:39 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:38 ema: pool cp5009 with ATS backend T227432
  • 10:37 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:35 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:30 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:29 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:19 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:13 godog: bounce logstash on logstash2004
  • 10:07 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:05 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:43 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 godog: temporarily stop logstash on logstash2005 to test performance with two ingesters only - T215904
  • 09:23 godog: temporarily stop logstash on logstash2006 to test performance with two ingesters only - T215904
  • 09:10 ema: depool cp5009 and reimage as text_ats T227432
  • 08:25 ariel@deploy1001: Finished deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation (duration: 00m 03s)
  • 08:25 ariel@deploy1001: Started deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation
  • 06:37 elukey: upgrade cergen to 0.2.5 on puppetmaster1001
  • 03:44 vgutierrez: switch from nginx to ats-tls on cp4032 - T231627
  • 03:09 vgutierrez: switch from nginx to ats-tls on cp4031 - T231627
  • 02:51 vgutierrez: switch from nginx to ats-tls on cp4030 - T231627
  • 01:41 eileen: civicrm revision changed from 0547c84f73 to d2045c6b98, config revision is 1a709a61aa (looks like patch was still hung in gerrit last time)
  • 01:34 eileen: civicrm revision is 0547c84f73, config revision is 1a709a61aa - that should stop those failmails
  • 00:40 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/WikiLove/resources/ext.wikiLove.icon.vector.css: T236958 Fix Vector icon after upstream change (duration: 01m 02s)
  • 00:38 eileen: civicrm revision changed from a55c2d2787 to 0547c84f73, config revision is 1a709a61aa

2019-10-30

  • 23:21 ejegg: updated fundraising python tools from ffc7bf764b to a93eec292d
  • 23:08 XioNoX: power cycle cr3-esams re1 - T236598
  • 22:29 mutante: scandium - live hack /srv/mediawiki/wmf-config/InitialiseSettings.php - set wmgMemoryLimit to 850 (*1024 *1024), restart php7.2-fpm (T236833)
  • 22:22 andrew@deploy1001: Finished deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode (duration: 03m 15s)
  • 22:19 andrew@deploy1001: Started deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837 (duration: 13m 54s)
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837
  • 21:31 ppchelko@deploy1001: Finished deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838 (duration: 14m 04s)
  • 21:17 ppchelko@deploy1001: Started deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838
  • 20:47 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:47 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:46 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 20:46 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:42 arlolra: Updated Parsoid to 5ac1623 (T235656, T233818, T234549, T227209, T236112)
  • 20:29 otto@deploy1001: Synchronized wmf-config/LabsServices.php: Syncing LabsServices.php change for beta eventgate instance replacement (duration: 01m 01s)
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623 (duration: 09m 10s)
  • 20:25 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 18s)
  • 20:24 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623
  • 20:17 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: WikimediaEditorTasks: Enable edit streaks on beta (duration: 01m 03s)
  • 20:11 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:11 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:10 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 51s)
  • 20:09 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:07 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 07s)
  • 20:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:06 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 23s)
  • 20:06 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 05s)
  • 20:03 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 19:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 19:06 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.4 (duration: 01m 00s)
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.4
  • 19:05 mutante: moscovium - stop and remove rsync server, purge rsync package T180641
  • 18:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T222851 Migrate to Kask for Echo seen-time storage (duration: 01m 01s)
  • 17:43 elukey: upload cergen 0.2.5-1+deb10u1 to buster-wikimedia component/cergen
  • 17:41 elukey: run reprepro clearvanished on install1002 to clean leftovers of buster-wikimedia|thirdparty/elastic7
  • 17:37 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 17:37 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 17:29 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Revert 16:05 UTC T236928 (duration: 01m 05s)
  • 17:26 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Revert 16:02 UTC T236928 (duration: 01m 04s)
  • 16:59 jynus: killed rebuildItemTerms on mwmaint1002
  • 16:05 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T234948) (duration: 01m 04s)
  • 16:02 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 01m 05s)
  • 15:48 godog: roll restart logstash after https://gerrit.wikimedia.org/r/c/operations/puppet/+/544217
  • 15:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 06s)
  • 15:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 05s)
  • 15:36 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:29 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 15:23 gehel: shutting down elastic1039 to be ready for disk swap - T236601
  • 15:10 effie: enable-puppet in mw* hosts
  • 15:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T210174 Load Wikisource extension when wmgUseWikisource is true (duration: 01m 01s)
  • 14:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236502 Define wmgUseWikisource as default-false (duration: 01m 22s)
  • 14:40 ema: pool cp5008 with ATS backend T227432
  • 14:32 effie: disable puppet on all mw* hosts
  • 14:20 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:39 andrew@deploy1001: Finished deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver (duration: 03m 38s)
  • 13:36 andrew@deploy1001: Started deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver
  • 12:59 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=cp5008.eqsin.wmnet
  • 12:58 moritzm: rolling restart of slapd to pick up LDAP schema change
  • 12:57 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet
  • 12:50 arturo: updating package versions in install1002 for thirdparty/kubeadm-k8s stretch-wikimedia (T236824)
  • 12:23 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:22 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 moritzm: temporarily disabling puppet on LDAP servers for a schema change
  • 11:42 ema: depool cp5008 and reimage as text_ats T227432
  • 11:37 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 11:31 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase rate limits for newbie non-ip users on Commons (duration: 01m 01s)
  • 11:13 Urbanecm: EU SWAT done
  • 11:12 Urbanecm: Synchronized wmf-config/InitialiseSettings.php: SWAT: 61cb77c: Re-apply: MCR: Set testwiki to use the new MCR-only schema (T198558) (duration: 00m 59s)
  • 10:07 jynus: restarting bacula-dir, bacula-sd on backup1001 T236406
  • 09:46 vgutierrez: Switch from nginx to ats-tls on cp4029 - T231627
  • 09:34 vgutierrez: Switch from nginx to ats-tls on cp4028 - T231627
  • 09:25 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 08:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 08:25 moritzm: installing php7.0 security updates
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:57 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 05:58 vgutierrez: Rolling restart of ats-tls to get rid of leaked sockets and benefit from the lower inactivity timeout - T236458
  • 04:24 vgutierrez: restarting ats-tls on cp4027 with half open disabled - T236458
  • 03:09 vgutierrez: Rolling restart of prometheus-exporter-trafficserver-tls - T236458
  • 02:40 vgutierrez: restarting ats-tls on cp3050 with half open disabled - T236458
  • 00:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php

2019-10-29

  • 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 23:09 mutante: ganeti1003 - gnt-instance remove ununpentium.wikimedia.org (T236748)
  • 23:05 Urbanecm: Evening SWAT done
  • 23:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/atjwiki* (T236777)
  • 23:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: f7b9972: Revert "Milestone lobo for atjwiki" (T236777) (duration: 01m 01s)
  • 22:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:17 mutante: ununpentium - shutdown Ganeti VM - running decom script, schedule icinga downtime (T236748)
  • 22:14 mutante: rsynced data dump and config from ununpentium to moscovium in /srv/ before shutting down the old server (T180641)
  • 20:43 papaul: rebooting cp3056 for HW check
  • 20:19 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw complete (T235654)
  • 19:42 andrew@deploy1001: Finished deploy [horizon/deploy@dbe892e]: (no justification provided) (duration: 03m 59s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@dbe892e]: (no justification provided)
  • 19:32 jynus: restarting bacula-fd on install1002 T236406
  • 19:31 andrew@deploy1001: Finished deploy [horizon/deploy@bab5d37]: (no justification provided) (duration: 01m 35s)
  • 19:30 andrew@deploy1001: Started deploy [horizon/deploy@bab5d37]: (no justification provided)
  • 19:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.4
  • 19:14 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache (duration: 21m 11s)
  • 18:54 jynus@cumin1001: dbctl commit (dc=all): 'Revert state to before overload+maintenance', diff saved to https://phabricator.wikimedia.org/P9501 and previous config saved to /var/cache/conftool/dbconfig/20191029-185438-jynus.json
  • 18:53 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache
  • 18:53 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw (T235654)
  • 18:50 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.1 (duration: 08m 09s)
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902 (duration: 14m 13s)
  • 18:07 ppchelko@deploy1001: Started deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902
  • 17:42 brennen: cutting branch for 1.35.0-wmf.4
  • 17:38 mutante: phab1001 - upgrading php7.3 packages
  • 17:34 mutante: phab2001 - upgrading PHP packages
  • 17:06 jynus@cumin1001: dbctl commit (dc=all): 'repool db1099 both instances fully to increase redundancy', diff saved to https://phabricator.wikimedia.org/P9499 and previous config saved to /var/cache/conftool/dbconfig/20191029-170648-jynus.json
  • 16:56 jynus@cumin1001: dbctl commit (dc=all): 'depool fully db1105:3311, stability/lag issues', diff saved to https://phabricator.wikimedia.org/P9498 and previous config saved to /var/cache/conftool/dbconfig/20191029-165633-jynus.json
  • 16:52 ssastry@deploy1001: Finished deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d (duration: 09m 35s)
  • 16:46 jynus@cumin1001: dbctl commit (dc=all): 'pool db1106 into s1 rcs', diff saved to https://phabricator.wikimedia.org/P9497 and previous config saved to /var/cache/conftool/dbconfig/20191029-164640-jynus.json
  • 16:43 ssastry@deploy1001: Started deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d
  • 16:39 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 16:28 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 06m 11s)
  • 16:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 16:22 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 16:20 mutante: reloading nginx on wtp*
  • 15:57 bstorm_: restarted ferm on labstore1006 -- it failed an external DNS lookup due to brief issues apparently on the other end
  • 15:25 vgutierrez: restarting ats-tls on cp5007 with a default inactivity timeout of 5 minutes and half open disabled - T236458
  • 15:04 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 15:01 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 14:58 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 14:45 robh: setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538
  • 14:32 Krinkle: krinkle@webperf1001.eqiad Restart navtiming, coal and statsv services
  • 14:29 elukey: upgrade python-kafka on webperf[12]001 - T234808
  • 14:27 Krinkle: krinkle@webperf2001 Restart navtiming, coal and statsv services
  • 12:32 hashar: Restarting Zuul / Jenkins
  • 12:31 hashar: Stopping Zuul / Jenkins for upgrade
  • 12:29 akosiaris: delete all production00 volumes on backup1001
  • 11:48 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 11:37 Urbanecm: EU SWAT done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: faeb8f1: Allow AbuseFilter to issue blocks on es.wikinews (T236730) (duration: 00m 53s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fc9920e: Rename Author talk namespace at thwikisource (T236640) (duration: 00m 56s)
  • 11:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 11:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 10:51 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jakob@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:39 jakob@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:33 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 10:29 moritzm: installing php5 security updates
  • 10:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 10:21 jynus: running import on m1-master, m1 replicas will lag for a whileT236406
  • 10:20 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 XioNoX: disable cr3-esams:et-1/0/0 (flapping)
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 gehel: plugin upgrade on relforge - T236123
  • 09:27 godog: reimage elastic 7 hw with Buster
  • 09:27 vgutierrez: restart ats-tls on cp5007 disabling TCP SO_LINGER - T236458
  • 08:43 jynus: shutting down db1099 T227538
  • 08:35 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1099', diff saved to https://phabricator.wikimedia.org/P9492 and previous config saved to /var/cache/conftool/dbconfig/20191029-083547-jynus.json
  • 08:15 XioNoX: push term allow_vmhost ro cr3-esams loopback4 filter - T236598
  • 08:06 vgutierrez: restarting ats-tls on cp5007 with TCP FASTOPEN disabled - T236458
  • 07:40 moritzm: installing php7.3 security updates
  • 07:06 elukey: roll restart java daemons on analytics1042, druid1003 and aqs1004 to pick up new openjdk upgrades
  • 07:01 _joe_: restart memcached on mc1024-1036, 1 hour apart, via cumin (T235188)
  • 06:26 _joe_: restart memcached on mc1023 T23518
  • 03:35 vgutierrez: restarting varnish-frontend on cp5008

2019-10-28

  • 23:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy Echo kask migration to officewiki for testing, part 3 (T222851) (duration: 00m 52s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy Echo kask migration to officewiki for testing, part 2 (T222851) (duration: 00m 52s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/ProductionServices.php: Deploy Echo kask migration to officewiki for testing, part 1 (T222851) (duration: 00m 54s)
  • 23:18 mutante: re-enabling puppet on moscovium (RT)
  • 22:02 ejegg: re-enabled basic fundraising jobs (Queue consumers, audit processors, TY mailer)
  • 20:56 cdanis: restart memcached on mc1022 T235188
  • 20:37 Jeff_Green: authdns update to switch fundraising db service hostname
  • 20:19 ejegg: disabled all fundraising scheduled jobs
  • 19:50 rlazarus: restarted memcached on mc1021 (T235188)
  • 19:41 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 02m 42s)
  • 19:38 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 18:53 moritzm: updating PHP on people1001
  • 18:52 Urbanecm: Morning SWAT done
  • 18:42 urbanecm@deploy1001: Synchronized wmf-config/logging.php: SWAT: 1a09e2a: Direct Parsoid/PHP logs to a parsoid-php log "type" (T235899) (duration: 00m 52s)
  • 18:41 rlazarus: restarted memcached on mc1020 T235188
  • 18:32 mutante: moscovium - rename all files in /etc/request-tracker4/RT_SiteConfig.d to have a .pm extension - this fixed RT - login works again - puppet patch coming up (T180641)
  • 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 30111f3: Enable mapframe at kawiki (T229726) (duration: 00m 53s)
  • 18:28 mutante: moscovium - deleting /etc/request-tracker4/RT_SiteConfig.d/ 50-debconf.pm and 51-dbconfig-common.pm which duplicate the same files without .pm extension with wrong values, probably due to some package change (T180641)
  • 18:27 jgleeson: updated paymentswiki from 7bb9f5257e to e28bc54e85
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: c48271d: Revert "Config changes for Echo kask migration" (T222851) (duration: 00m 53s)
  • 18:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditor.php: SWAT: b19ad5f: Revert "Revert "ApiVisualEditor: Return etag with content for preloaded content""; 4f3b724: ApiVisualEditor: Fix preload handling further (T233320) (duration: 00m 53s)
  • 18:15 Urbanecm: Run mwscript namespaceDupes.php --wiki=thwikisource --fix (T236640)
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ea927dd: Rename author NS at thwikisource (T236640) (duration: 00m 53s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: ddaa534: Config changes for Echo kask migration (T222851) (duration: 00m 55s)
  • 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:12 bblack: mr1-eqiad: fix bast3004 access for eqiad mgmt network - T236686
  • 17:11 _joe_: starting rolling restart of memcached servers in eqiad, beginning with mc1019 T235188
  • 17:11 bblack: mr1-codfw: fix bast3004 access for codfw mgmt network - T236686
  • 17:10 bblack: mr1-ulsfo: fix bast3004 access for ulsfo mgmt network - T236686
  • 16:57 bblack: mr1-eqsin: fix bast3004 access for eqsin mgmt network - T236686
  • 16:56 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:55 bblack: mr1-esams: fix bast3004 access for esams mgmt network - T236686
  • 16:36 jbond42: restart puppetdb on pupetdb1001 to remove queue
  • 13:50 ema: pool cp5007 with ATS backend T227432
  • 13:30 godog: roll restart logstash in codfw/eqiad to apply new config
  • 13:23 effie: enable puppet on mw1*, depool and repool to reload apache - T229792
  • 13:13 effie: enable puppet on mw[1261-1265].eqiad.wmnet (mw canaries), depool and repool to reload apache - T229792
  • 13:07 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 effie: enable puppet on mw2* servers, depool and repool to reload apache - T229792
  • 13:01 jynus: stop db1114 for testing
  • 12:30 ema: depool cp5007 and reimage as text_ats T227432
  • 12:22 effie: depool mw2150
  • 11:56 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001 (duration: 00m 05s)
  • 11:56 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001
  • 11:34 Urbanecm: EU SWAT done
  • 11:33 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: 8caf681: Dont log missing ETags when creating a new page, thats normal (T233320) (duration: 00m 54s)
  • 11:33 effie: Disable puppet on mw* for 545652 - T229792
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dd2f06c: Add Translate channel for the Translate extension (T221119) (duration: 00m 53s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ff17666: Adjust wgUploadNavigationUrl for azwiki to point to commons UpWiz (T236307) (duration: 00m 53s)
  • 11:05 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 7e26ef4: Revert "Restrict uploads on azwiki" (T236307) (duration: 00m 53s)
  • 11:02 moritzm: installing OpenJDK security updates on elastic*
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 08:48 godog: bump udp_localhost kafka-logging topics to 6 partitions and roll-restart logstash and rsyslog - T215904
  • 08:26 volans: manually cleanup changes reverted in https://gerrit.wikimedia.org/r/546407 on icinga[12]001 - T222074
  • 08:25 moritzm: installing file/libmagic security updates
  • 08:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791 (duration: 13m 42s)
  • 08:15 godog: swift eqiad-prod: final weight to ms-be105[1-6] - T232367
  • 08:02 mobrovac@deploy1001: Started deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389 (duration: 13m 44s)
  • 07:40 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt (duration: 00m 05s)
  • 07:40 elukey@deploy1001: Started deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt
  • 07:37 elukey: upload archiva 2.2.4-1 to wikimedia-stretch (fix to avoid overriding archiva.xml upon install)
  • 07:27 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389
  • 07:25 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org (duration: 02m 37s)
  • 07:22 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org

2019-10-26

  • 11:30 XioNoX: restart cr3-esams
  • 11:01 XioNoX: re0.cr3-esams> request chassis routing-engine master switch

2019-10-25

  • 22:55 mutante: moscovium rm /dev/shm/envoy_shared_memory_0 to revive envoy which failed to run after changing ports and reinstalling it (T180641)
  • 22:42 mutante: moscovium - manually deleting envoy listener on 1443 and letting puppet recreate config because it's not removed if you change the port (T180641)
  • 21:55 mutante: running puppet on ulsfo cp-ats servers to pick up config change for RT backend
  • 20:42 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes (duration: 00m 06s)
  • 20:41 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: test deploy design/style-guide (duration: 00m 10s)
  • 20:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: test deploy design/style-guide
  • 17:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 bblack: lvs3005 - reimaging to fix partman issue, high-traffic1 (text) to lvs3007 for the duration
  • 16:43 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 bblack: lvs3006 - reimaging to fix partman issue, high-traffic2 (upload/maps) to lvs3007 for the duration
  • 16:19 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292 (duration: 13m 31s)
  • 16:05 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292
  • 16:04 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292 (duration: 00m 43s)
  • 16:04 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292
  • 15:35 robh: ps1-oe14-esams ip info set, rebooting (wont affect servers) via T184066
  • 15:03 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 15:01 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:00 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 bblack: cr[23]-esams: re-route ns2 IP to ganeti3003
  • 14:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292 (duration: 00m 44s)
  • 14:31 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292
  • 14:30 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292 (duration: 00m 05s)
  • 14:30 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292
  • 14:28 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292 (duration: 01m 02s)
  • 14:27 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292
  • 14:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:10 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:09 bblack: reboot ganeti3003
  • 13:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 ema: pool cp4032 with ATS backend T227432
  • 13:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 effie: depool mw1334 and pool back
  • 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4032.ulsfo.wmnet,service=ats-be
  • 13:05 ema: depool cp4032 and reimage as text_ats T227432
  • 12:34 jynus: introducing new freshnesh check for bacula T234900
  • 12:11 ema: pool cp4031 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:59 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4031.ulsfo.wmnet,service=ats-be
  • 09:56 ema: depool cp4031 and reimage as text_ats T227432
  • 09:39 ema: pool cp4030 with ATS backend T227432
  • 09:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 XioNoX: powering off mr1-esams again
  • 09:20 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 XioNoX: going to power down mr1-esams (esams mgmt is going to go down) for 30min the time to move power cables
  • 09:02 jynus: disabling persistent journald on db1074
  • 09:01 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4030.ulsfo.wmnet,service=ats-be
  • 08:58 ema: depool cp4030 and reimage as text_ats T227432
  • 08:48 vgutierrez: switch from nginx to ats-tls on cp3050 - T231627
  • 08:45 godog: stop prometheus on bast300[24] and done last round of rsync data - T236329
  • 08:37 ema: lvs1015: restart pybal to add labweb-ssl T210411
  • 08:36 ema: test
  • 08:34 ema@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
  • 08:32 ema: lvs1016: restart pybal to add labweb-ssl T210411
  • 08:02 vgutierrez: rolling restart of ats-tls to introduce a SSL handshake timeout of 60 secs - T236458
  • 07:35 akosiaris: reboot webperf1002 for disk resize T235455
  • 07:29 akosiaris: reboot webperf2002 for disk resize T235455
  • 05:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:35 vgutierrez: reimage lvs3007 to let it get the proper partman configuration - T236294
  • 05:03 vgutierrez: Applying a SSL handshake timeout of 60 secs on ats-tls/cp5007 - T236458
  • 04:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:55 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:53 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:52 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:50 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:49 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:24 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns3001.*
  • 03:08 bblack: cr2-esams + cr3-esams : remove nescio and maerlant from anycast4 neighbor list
  • 03:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 03:05 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3049.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 02:44 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3043.esams.wmnet
  • 02:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 02:09 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 01:52 bblack: mr1-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:50 bblack: asw2-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:46 bblack: cr2-esams + cr3-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 01:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3047.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3046.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 01:13 mutante: puppetmaster1001 - revoking parsoid.svc.eqiad / parsoid.svc.codfw / parsoid.discovery.wmnet certificates and creating new ones including parsoid-php.discovery.wmnet (T233654)
  • 00:52 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/LiquidThreads/classes/View.php: (no justification provided) (duration: 00m 54s)

2019-10-24

  • 23:46 mutante: bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329)
  • 23:24 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s)
  • 22:16 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:13 mutante: gerrit1001 - starting gerrit
  • 22:13 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 thcipriani: stopping gerrit briefly for script run for T236344
  • 22:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:01 mutante: mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792)
  • 21:56 eileen: civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1
  • 21:16 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
  • 21:16 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 21:12 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet
  • 21:06 bblack: cr3-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 bblack: cr2-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 urandom: restbase cassandra rolling restart, codfw / rack 'd' -- T200803
  • 21:02 bblack: downtimed lvs3001-4, stopping pybal there, etc...
  • 20:58 bblack: cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:58 bblack: cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:40 bblack: esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently)
  • 20:36 bblack: esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again)
  • 20:33 urandom: restbase cassandra rolling restart, codfw / rack 'c' -- T200803
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
  • 20:23 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet
  • 20:22 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 20:04 bblack: reboot cp3054 again for good measure
  • 19:57 bblack: cp3054 - trying racadm serveraction hardreset
  • 19:32 bblack: reboot dns3001
  • 19:31 urandom: restbase cassandra rolling restart, codfw / rack 'b' -- T200803
  • 19:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:06 urandom: restbase cassandra rolling restart, rack 'd' -- T200803
  • 19:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:59 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:57 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 Urbanecm: Morning SWAT done
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 urandom: restbase cassandra rolling restart, rack 'b' -- T200803
  • 18:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:31 bblack: cr3-esams: add dns3001 to anycast4 neighbors
  • 18:30 bblack: cr2-esams: add dns3001 to anycast4 neighbors
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 263fd0f: Enable Wikibase client access on commonswiki (T223792) (duration: 00m 52s)
  • 18:25 urandom: restbase cassandra rolling restart, rack 'a' -- T200803
  • 18:22 robh: completing ps1-b6-eqiad setup, pdu will reboot twice, power output unaffected T227540
  • 18:20 robh: ps1-a6-eqiad setup complete, icinga errors should clear up T227142
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 84c48df: rename service definition (T222851) (duration: 00m 53s)
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b20d6de: Reference Previews: full beta deployment (T235083) (duration: 00m 52s)
  • 18:03 robh: setting ip info for ps1-a6-eqiad, it is rebooting. T227142
  • 17:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 ema: pool cp3059 (cache_upload) T233242
  • 17:29 bblack: asw2-esams - committing switch port/vlan config for new rack 14 hosts
  • 17:26 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable Parsoid/PHP in the whole wtp (a.k.a. Parsoid) cluster - T236388 (duration: 00m 53s)
  • 17:18 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:54 ema: depool cp3036 (cache_upload) T233242
  • 16:39 urandom: restarting cassandra, restbase2011 (canary for config changes) -- T200803
  • 16:32 urandom: restarting cassandra, restbase1016 (canary for config changes) -- T200803
  • 16:28 ema: depool cp3035 (cache_upload) T233242
  • 16:07 ema: pool cp3057 (cache_upload) T233242
  • 15:51 ema: depool cp3032 (cache_text) T233242
  • 15:45 ema: depool cp3034 (cache_upload) T233242
  • 15:40 ema: depool cp3030 (cache_text) T233242
  • 15:27 bblack: asw2-esams: configure port descriptions and vlan/lvs groupings for all rack16 hosts (lvs3007, ganeti3003, bast3004, cp3061-5)
  • 15:19 ema: pool cp3058 (cache_text) T233242
  • 15:18 effie: Slowly reload apache across the fleet (as we are enabling puppet) - T229792
  • 15:09 effie: Remove hhvm packages and enable puppet across the fleet - T229792
  • 15:09 ema: pool cp3055 (cache_upload) T233242
  • 15:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testcommonswiki, Enable Wikibase client access T223792 (duration: 00m 53s)
  • 15:00 bblack: cr2-esams - add missing lvs3005 IP to bgp pybal neighbor list
  • 14:58 bblack: cr3-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:58 bblack: cr2-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:47 effie: run puppet on all canaries and codfw - T229792
  • 14:42 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:40 effie: Remove hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from all canaries and codfw - T229792
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:26 bblack: lvs3006 (upload, becoming active) - manual pybal med s/90/0/ (will take over from lvs3002, intended permanently).
  • 14:23 bblack: lvs3006 (upload, inactive) - manual pybal med s/100/90/ (preferred to lvs3004 for fallback from lvs3002)
  • 14:22 effie: enable puppet on mw app canaries
  • 14:16 ema: power-cycle cp3056, stuck rebooting into d-i T233242
  • 13:59 ema: pool cp3060 T233242
  • 13:36 bblack: re-pooling esams in dns
  • 13:34 effie: enable puppet on mwdebug*
  • 13:25 XioNoX: enable transit4/6 on cr2-knams
  • 13:24 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=varnish-be,name=cp30[56].*
  • 13:24 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp30[56].*,service=varnish-be
  • 13:23 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=nginx
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=nginx
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3063.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3051.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3059.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3061.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3057.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3065.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3055.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3053.esams.wmnet
  • 13:17 ema: set ats-be weights on new esams upload nodes T233242
  • 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.3
  • 12:56 effie: purge hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from mw* canaries - T229792
  • 12:42 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp3060.esams.wmnet,service=varnish-be
  • 12:33 effie: Stopping puppet on all hosts including the hhvm class (C:hhvm) - 544864 - T229792
  • 12:25 ema: cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242
  • 12:14 bblack: depool esams in geodns
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 after analyze table', diff saved to https://phabricator.wikimedia.org/P9468 and previous config saved to /var/cache/conftool/dbconfig/20191024-120812-marostegui.json
  • 12:06 XioNoX: shutdown cr1-esams - cr2-knams link
  • 12:00 XioNoX: shutdown transit BGP sessions on cr2-knams
  • 11:40 Urbanecm: EU SWAT done
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3a5cb68: Permission changes of move-rootuserpages assignment at commonswiki (T236359) (duration: 01m 00s)
  • 11:33 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:31 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 Urbanecm: Run mwscript namespaceDupes.php --wiki=commonswiki --add-prefix=FIXME --fix (T236352)
  • 11:28 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e079956: Add CAT as alias for NS_CATEGORY at commonswiki (T236352) (duration: 01m 00s)
  • 11:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 2d66deb: Restrict uploads on azwiki (T236307) (duration: 01m 03s)
  • 11:15 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/WikibaseMediaInfo: Also use custom PrefetchingTermLookup in SingleEntitySourceServices (duration: 01m 01s)
  • 11:13 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Allow defining entity-type-specific PrefetchingTermLookup (duration: 01m 06s)
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights for db1093 and db1085', diff saved to https://phabricator.wikimedia.org/P9466 and previous config saved to /var/cache/conftool/dbconfig/20191024-101810-marostegui.json
  • 09:59 hashar: Converting CI jobs to use the new PostBuildScript plugin config | https://gerrit.wikimedia.org/r/#/c/integration/config/+/544907/ | T188398
  • 09:57 hashar: Restarting CI Jenkins
  • 09:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T234853 Re-enable performance perception survey on ruwiki (duration: 01m 04s)
  • 08:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:36 godog: roll restart rsyslog in codfw/eqiad to pick up new kafka partitions
  • 08:18 godog: roll restart rsyslog in ulsfo/esams/eqsin to pick up new kafka partitions
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092 for analyze table', diff saved to https://phabricator.wikimedia.org/P9465 and previous config saved to /var/cache/conftool/dbconfig/20191024-081519-marostegui.json
  • 07:57 XioNoX: reboot mr1-esams
  • 07:42 godog: bump rsyslog- topics partitions to 6 and roll-restart logstash frontends
  • 07:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:22 XioNoX: drain Telia link on cr2-esams
  • 06:32 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid-php,name=eqiad
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9463 and previous config saved to /var/cache/conftool/dbconfig/20191024-052002-marostegui.json
  • 05:18 marostegui: Run analyze enwiki.revision on db2092 T223151
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9462 and previous config saved to /var/cache/conftool/dbconfig/20191024-045954-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from special slaves group and leave it with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9461 and previous config saved to /var/cache/conftool/dbconfig/20191024-045924-marostegui.json
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9460 and previous config saved to /var/cache/conftool/dbconfig/20191024-045544-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:55 shdubsh: temporarily turn down accept delay on fermium - T235983
  • 00:03 mutante: restarting gerrit to increase heap_size from 20G to 32G (T225166 T222391)

2019-10-23

  • 22:55 brennen@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/AbuseFilter: SWAT: Unbreak filter edit form (T236286) (duration: 01m 05s)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:20 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 05s)
  • 22:19 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:15 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 01m 10s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:00 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:00 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 21:32 mutante: webperf1002/2002 - starting bacula-fd service that is failed after initial puppet run turning them into backup::hosts
  • 21:14 ejegg: updated Fundraising python tools from b3c7453be2 to ffc7bf764b
  • 20:37 shdubsh: restart nagios-nrpe-server on stat1007
  • 18:56 milimetric@deploy1001: Finished deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts (duration: 07m 53s)
  • 18:49 milimetric@deploy1001: Started deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts
  • 18:29 mforns@deploy1001: Finished deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59 (duration: 06m 40s)
  • 18:22 mforns@deploy1001: Started deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59
  • 17:31 akosiaris: restart varnish-be on cp1089 as a response to HTTP availability alerts. High mailbox lag
  • 17:25 akosiaris: restart varnish-be on cp1081 as a response to HTTP availability alerts
  • 15:55 _joe_: restarting pybal on lvs2006, then 2003 for picking up parsoid-php
  • 15:32 marostegui: Enable slow query log 1/20 on db1089 (enwiki) T223151
  • 14:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:39 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:38 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:37 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:35 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:19 bblack: repooling esams
  • 14:00 hashar: Restarting CI Jenkins
  • 13:57 _joe_: manually changing the symlinked deployed version of parsoid on wtp1025 T236275
  • 13:35 XioNoX: migrate esams mgmt to new mgmt router
  • 13:34 effie: disable puppet on mwdebug1002 - T214734
  • 13:13 ssastry@deploy1001: Finished deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues (duration: 08m 44s)
  • 13:07 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.3 (duration: 01m 00s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.3
  • 13:04 ssastry@deploy1001: Started deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues
  • 12:37 effie: Depool mwdebug1002 - T214734
  • 12:31 vgutierrez: restarting ats-tls on cache text nodes - T233274
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from the special slaves group on s5 and leave it back with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9454 and previous config saved to /var/cache/conftool/dbconfig/20191023-122708-marostegui.json
  • 11:26 XioNoX: powering down cr1-esams
  • 11:24 Urbanecm: EU SWAT done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: e21054e: Add Balinese to interwiki sort orders (T234768) (duration: 01m 01s)
  • 11:18 Urbanecm: mwscript updateArticleCount.php --wiki=frwikiquote --update (T236212)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (2/2; T234278) (duration: 01m 01s)
  • 11:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (1/2; T234278) (duration: 01m 01s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf8e2f1: Set $wgArticleCountMethod to any for frwikiquote (T236212) (duration: 01m 12s)
  • 10:46 ema: cp-ats: rolling ATS backend restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545522/ T233274
  • 10:13 jynus: reverting dbtree revision to HEAD~1 T224589
  • 10:11 jynus: deploying new version of dbtree T224589
  • 10:04 ema: cp1075: ats-backend-restart to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545508/
  • 09:42 godog: bounce burrow-logging-eqiad.service on kafkamon1001
  • 09:40 godog: roll restart logstash to pick up new rsyslog-notice partitions
  • 09:31 godog: bump rsyslog-notice topic to 6 partitions
  • 09:00 moritzm: rebooting logstash2021 for some firmware tests
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 moritzm: installing systemd bugfix update on mw canaries
  • 08:50 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 godog: roll restart rsyslog on cirrus and wqds hosts to pick up changes to logback topic partitions
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312 after table compression', diff saved to https://phabricator.wikimedia.org/P9452 and previous config saved to /var/cache/conftool/dbconfig/20191023-082826-marostegui.json
  • 08:23 godog: roll restart logstash in codfw/eqiad to pick up new kafka partitions
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9451 and previous config saved to /var/cache/conftool/dbconfig/20191023-082246-marostegui.json
  • 08:11 godog: kafka-logging eqiad set 12 partitions for ^mwlog- ^logback- and eqiad.client.error topics
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9450 and previous config saved to /var/cache/conftool/dbconfig/20191023-080857-marostegui.json
  • 07:55 godog: kafka-logging delete unused topic syslog-notice
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9449 and previous config saved to /var/cache/conftool/dbconfig/20191023-075106-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9448 and previous config saved to /var/cache/conftool/dbconfig/20191023-074828-marostegui.json
  • 07:46 XioNoX: powering down cr2-esams for relocation (for real this time)
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9447 and previous config saved to /var/cache/conftool/dbconfig/20191023-073831-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9446 and previous config saved to /var/cache/conftool/dbconfig/20191023-073556-marostegui.json
  • 07:30 XioNoX: powering down cr2-esams for relocation
  • 07:28 hashar: logstash: refreshing index fields for logstash-* indices (via https://logstash.wikimedia.org/app/kibana#/management/kibana/indices/logstash-* ) # T234564
  • 07:05 XioNoX: redirect ns2 to eqiad - T235805
  • 07:04 marostegui: Enable slow query log 1/10 on db1089 (enwiki) T223151
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:59 XioNoX: depool esams - T235805
  • 06:57 effie: Depooling mw1317
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:38 marostegui: Compress tables on db1097:3315 T235599
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9445 and previous config saved to /var/cache/conftool/dbconfig/20191023-063800-marostegui.json
  • 05:29 ema@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kibana,name=codfw
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9444 and previous config saved to /var/cache/conftool/dbconfig/20191023-052940-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9443 and previous config saved to /var/cache/conftool/dbconfig/20191023-050812-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9442 and previous config saved to /var/cache/conftool/dbconfig/20191023-045722-marostegui.json
  • 04:49 vgutierrez: repool cp5007 - T234887
  • 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9441 and previous config saved to /var/cache/conftool/dbconfig/20191023-044833-marostegui.json
  • 04:36 MaxSem: Fixed a page title via namespaceDupes.php on pswiki
  • 03:51 vgutierrez: depool cp5007 - T234887

2019-10-22

  • 23:57 maxsem@deploy1001: Synchronized php-1.35.0-wmf.3/includes/block/DatabaseBlock.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/545373/ (duration: 00m 59s)
  • 23:53 maxsem@deploy1001: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543943/ (duration: 01m 01s)
  • 23:43 maxsem@deploy1001: Synchronized dblists/: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 00m 59s)
  • 23:41 maxsem@deploy1001: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 01s)
  • 23:38 maxsem@deploy1001: Synchronized dblists/labtestwiki.dblist: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 02s)
  • 23:32 mutante: LDAP - added keepit-ssh to wmf group (T236209)
  • 22:23 ejegg: updated Fundraising CiviCRM from ff69d64ad4 to 47e0800001
  • 21:57 thcipriani: stopping gerrit to run ref-update script T236114
  • 21:57 thcipriani: stopping gerrit to run ref-update script
  • 21:45 mutante: LDAP - added lexnasser to nda group (T235688)
  • 21:07 eileen: process-control config revision is 95ee1bafb3 dedupe job re-enabled
  • 20:09 mutante: gerrit1001 - mkdir /srv/gerrit/cobalt/git - rsyncing /srv/gerrit/git from cobalt to /srv/gerrit/cobalt/git/ on gerrit1001 (T236114)
  • 19:42 hashar: gerrit1001: apt install colordiff # T236114
  • 19:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.3
  • 19:03 brennen: proceeding with train for 1.35.0-wmf.3
  • 18:09 mutante: DNS - added new Wikipedia language "mnw" (Mon) T235739 - a language spoken in Myanmar
  • 17:59 sbassett: Uploaded and applied (but did not deploy per releng) security fix for T234450 to wmf.3
  • 17:57 sbassett: Deployed security fix for T234450 to wmf.2
  • 17:57 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213) (duration: 05m 14s)
  • 17:54 mutante: restarting gerrit to disable jgit gc (T236114)
  • 17:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213)
  • 17:37 arlolra: Updated Parsoid to cf01d91 (T234057, T234768, T235296, T235684, T235563)
  • 17:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91 (duration: 07m 37s)
  • 17:20 bblack: geodns: re-pooling esams (at this point, we're entirely back in our "normal" state of affairs)
  • 17:19 arlolra@deploy1001: Started deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91
  • 16:51 bblack: geodns: moving all "normal" eqiad traffic back to eqiad (in addition to the esams-diverted traffic which is still pointed mostly at eqiad right now)
  • 16:21 mutante: running puppet on deployment servers
  • 16:20 thcipriani: restarting gerrit
  • 16:14 thcipriani: stopping gerrit to run a fix for T222391
  • 15:58 bblack: depooling esams temporarily to test traffic scenario on lvs1014
  • 15:47 bblack: enable pybal+puppet on rebooted lvs1014
  • 15:40 bblack: rebooting lvs1014
  • 15:28 liw@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache (duration: 37m 39s)
  • 15:26 XioNoX: repool esams
  • 15:20 XioNoX: rollback ns2 redirect
  • 15:13 bblack: re-disabling lvs1014 ...
  • 15:10 bblack: re-enabling lvs1014 pybal/puppet
  • 15:03 moritzm: rebooting kafka-main1005 for microcode debugging
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:52 bblack: stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016)
  • 14:50 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache
  • 14:45 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0 (duration: 02m 44s)
  • 14:42 mbsantos@deploy1001: Started deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0
  • 14:13 XioNoX: restart asw-esams for onsite work
  • 13:52 andrewbogott: restarted slapd on ldap-eqiad-replica01
  • 13:38 gehel: silencing LVS check for katotherian (we know there is an issue) - T236163
  • 13:35 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_2419219323" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 06m 40s)
  • 13:28 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.3 and rebuild l10n cache
  • 13:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 XioNoX: depool esams for onsite work - T235805
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3316 db1105:3311 db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9434 and previous config saved to /var/cache/conftool/dbconfig/20191022-130556-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9433 and previous config saved to /var/cache/conftool/dbconfig/20191022-125435-marostegui.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9432 and previous config saved to /var/cache/conftool/dbconfig/20191022-124607-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3316 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9431 and previous config saved to /var/cache/conftool/dbconfig/20191022-123757-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3312 and db1105:3311 after on-site maintenance T235877', diff saved to https://phabricator.wikimedia.org/P9430 and previous config saved to /var/cache/conftool/dbconfig/20191022-123257-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315', diff saved to https://phabricator.wikimedia.org/P9429 and previous config saved to /var/cache/conftool/dbconfig/20191022-123032-marostegui.json
  • 12:29 moritzm: rebooting miscweb2001 for some microcode tests
  • 12:28 marostegui: Compress db1096:3315
  • 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 after PDU maintenance T227142 (duration: 00m 50s)
  • 12:15 jynus: reimage to buster dbmonitor2001.wikimedia.org T224589
  • 11:57 liw: starting to cut branch for train 1.35-wmf.3
  • 11:51 hashar: Restarted CI Jenkins on contint1001
  • 11:35 marostegui: Stop MySQL on db1105:3311, db1105:3312 for firmware upgrade - T235877
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312 for firmware upgrade T235877', diff saved to https://phabricator.wikimedia.org/P9428 and previous config saved to /var/cache/conftool/dbconfig/20191022-113437-marostegui.json
  • 11:29 Urbanecm: EU SWAT done
  • 11:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor/: SWAT: 2bc4420 (T235707); 680a98b (T233320); d83265d (T234564) (duration: 00m 53s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0593f34: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections (T230614) (duration: 00m 54s)
  • 10:55 moritzm: rebooting rpki2001 for some microcode tests
  • 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:37 ema@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kibana
  • 10:32 jynus: shutting down db1115 in preparation for PDU maintanance, this will make tendril and dbtree unavailable for 2 hours T227142
  • 10:21 ema: lvs2003: restart pybal to add new service kibana-ssl T210411
  • 10:18 ema: lvs1015: restart pybal to add new service kibana-ssl T210411
  • 10:14 ema: puppetmaster1001: rm /var/run/confd-template/.kibana-ssl*.err to make confd icinga check happy T210411
  • 10:02 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=kibana-ssl
  • 09:54 ema: lvs2006: restart pybal to add new service kibana-ssl T210411
  • 09:54 ema: lvs1016: restart pybal to add new service kibana-ssl T210411
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9425 and previous config saved to /var/cache/conftool/dbconfig/20191022-091327-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9424 and previous config saved to /var/cache/conftool/dbconfig/20191022-091051-marostegui.json
  • 08:05 marostegui: Stop MySQL on labsdb1012 for PDU work T227142
  • 07:53 marostegui: Stop MySQL on db1116 pc1007 db1096:3315, db1096:3316 for PDU maintenance T227142
  • 07:18 moritzm: installing tcpdump security updates
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1010 T227142 (duration: 00m 52s)
  • 06:32 vgutierrez: rolling restart of ats-tls - T233274 T234803
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9423 and previous config saved to /var/cache/conftool/dbconfig/20191022-055151-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1070 from config T235464', diff saved to https://phabricator.wikimedia.org/P9422 and previous config saved to /var/cache/conftool/dbconfig/20191022-054759-marostegui.json
  • 05:41 marostegui: Stop mysql on db1070 - T235464
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1070 from config T235464 (duration: 00m 51s)
  • 05:40 marostegui: Remove db1070 from tendril and zarcillo - T235464
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1070 from config T235464 (duration: 00m 53s)
  • 05:33 vgutierrez: Switch from nginx to ats-tls on cp1090 - T231433
  • 05:24 vgutierrez: repooling cp2025 - T231433
  • 05:20 vgutierrez: depooling cp2025 to fix ATS/nginx configuration - T231433
  • 05:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:08 vgutierrez: Switch from nginx to ats-tls on cp1088 - T231433
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9421 and previous config saved to /var/cache/conftool/dbconfig/20191022-050204-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9420 and previous config saved to /var/cache/conftool/dbconfig/20191022-050048-marostegui.json
  • 04:58 vgutierrez: Switch from nginx to ats-tls on cp2026 - T231433
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp2024 - T231433
  • 04:18 vgutierrez: Switch from nginx to ats-tls on cp3049 - T231433
  • 03:44 vgutierrez: Switch from nginx to ats-tls on cp3047 - T231433
  • 01:12 eileen: disabled dedupe job pending T236096 deploy
  • 01:12 eileen: process-control config revision is 782a14c7d9

2019-10-21

  • 23:15 thcipriani: ops/puppet:sudo -u gerrit2 git update-ref refs/changes/66/535966/meta d6909e0 && sudo -u gerrit2 git update-ref refs/changes/66/535966/meta 8494c28 on gerrit1001
  • 23:11 mutante: rsynced operations/puppet.git/objects from cobalt to gerrit1001 (and backup in /root) (T222391)
  • 22:23 mutante: mw1340 - restarting php7.2-fpm, restarting apache2
  • 21:27 mutante: gerrit1001 manually running command from "list_mediawiki_extensions" cron (T222391)
  • 21:26 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b 30 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 21:23 thcipriani: ssh -p 29418 gerrit.wikimedia.org -- gerrit index start changes --force
  • 21:21 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2, ran puppet again. gerrit back up (T222391)
  • 21:18 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2
  • 21:16 cdanis: previous cumin invocation was to unblock gerrit migration; will be automatically restored to usual on next puppet run. T222391
  • 21:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin A:dns-auth 'perl -p -i".bak" -e "s/gerrit\./gerrit-replica./" /etc/wikimedia-authdns.conf'
  • 20:57 mutante: running puppet on gerrit1001
  • 20:57 thcipriani: running puppet on cobalt
  • 20:52 mutante: rsyncing gerrit-data/plugins and /var/lib/gerrit2/review_site/ again
  • 20:51 mutante: rsyncing gerrit-data/git again
  • 20:50 thcipriani: stopping gerrit on cobalt
  • 20:44 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 52s)
  • 20:37 mutante: disabled puppet on cobalt and gerrit2001
  • 20:29 mutante: running puppet on dbproxy10017 to apply ferm change for gerrit db from gerrit1001 (T222391)
  • 20:25 mutante: gerrit1001 - puppet agent disabled - gerrit service stopped
  • 20:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f (duration: 06m 02s)
  • 20:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f
  • 20:12 mutante: rsyncing /var/lib/gerrit2/review_site from cobalt to gerrit1001 (T222391)
  • 20:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/545027/ T235949 (duration: 00m 52s)
  • 20:08 mutante: rsynced /srv/gerrit/plugins from cobalt to gerrit1001 (T222391)
  • 20:08 mutante: rsynced /srv/gerrit/git from cobalt to gerrit1001 (T222391)
  • 18:43 Urbanecm: Morning SWAT done
  • 18:41 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor: SWAT: a4ab456: TreeModifier: Ignore removed nodes properly when normalizing from a text node (T235959); ecb4532: Update VE core submodule to a4ab456dc0 (T235959); a850cee: ApiVisualEditor: Always return etag with content (T233320) (duration: 00m 55s)
  • 18:32 robh: ps1-23-ulsfo back online, all pdu work in ulsfo is now complete T235911
  • 18:30 robh: ps1-22-ulsfo repaired (reseating its NIC rebooted its mgmt interface) Done with it and repeating on ps1-23-ulsfo via T235911
  • 18:24 robh: working on ps1-22-ulsfo via T235911 (it may flap but it is already ack'd as down in icinga, but not persistent)
  • 17:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@75c0577]: GUI Updates (duration: 11m 37s)
  • 17:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/: Update VisualEditor for set of back-ports in wmf.1 T233320, T234564, T235959 (duration: 00m 56s)
  • 17:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@75c0577]: GUI Updates
  • 14:16 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.2 refs T233850
  • 13:46 Urbanecm: Deploy sec patch for T104807
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3314 and db2091:3312 for table compression', diff saved to https://phabricator.wikimedia.org/P9412 and previous config saved to /var/cache/conftool/dbconfig/20191021-132633-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9411 and previous config saved to /var/cache/conftool/dbconfig/20191021-132440-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9410 and previous config saved to /var/cache/conftool/dbconfig/20191021-132145-marostegui.json
  • 13:07 ema: lvs1015: restart pybal to add new service wdqs-ssl T210411
  • 13:04 marostegui: Deploy schema change on db1122 (s2 primary master) - T233135 T234066
  • 13:04 ema: lvs2003: restart pybal to add new service wdqs-ssl T210411
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312 after schema change and remove db1129 from vslow and dump as it was was there temporarily', diff saved to https://phabricator.wikimedia.org/P9409 and previous config saved to /var/cache/conftool/dbconfig/20191021-130355-marostegui.json
  • 13:02 ema: lvs1016: restart pybal to add new service wdqs-ssl T210411
  • 13:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wdqs-ssl
  • 12:58 ema: lvs2006: restart pybal to add new service wdqs-ssl T210411
  • 12:38 hashar: Started zuul-merger on contint2001
  • 12:32 hashar: Stopped zuul-merger on contint2001
  • 12:31 hashar: Started zuul-merger on contint1001
  • 12:16 hashar: Stopped zuul-merger on contint1001
  • 12:02 Urbanecm: EU SWAT finally done
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e8d70c1: Partial cleanup of InitialiseSettings (T231178) (duration: 01m 00s)
  • 12:00 Urbanecm: I'm going to do one last sync for EU SWAT
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 12e3549: Create Portal namespace for sawikisource (T235343) (duration: 00m 59s)
  • 11:55 urbanecm@deploy1001: sync-file aborted: SWAT: 12e3549: Create Portal namespace for sawikisource (duration: 00m 01s)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3b1350b: wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there (T235904) (duration: 00m 59s)
  • 11:49 Urbanecm: Reopen EU SWAT
  • 11:42 awight: EU SWAT complete
  • 11:42 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Put reference previews back into beta mode on beta cluster (T233813) (duration: 01m 00s)
  • 11:38 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 543764|Enable ContentTranslation out of Beta in Malayalam/Bengali/Mongolian WPs (T233008, T233009, T234317) (duration: 01m 00s)
  • 11:34 moritzm: installing Java security updates on restbase-dev1004
  • 11:30 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/tests/phpunit/includes/Storage/SqlBlobStoreTest.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 3/3 - T235188 (duration: 01m 00s)
  • 11:28 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 2/3 - T235188 (duration: 00m 59s)
  • 11:25 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/Storage/SqlBlobStore.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 1/3 - T235188 (duration: 01m 00s)
  • 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:19 hashar: contint1001 / contint2001 : marking integration/config zuul merger repo readonly: sudo chown -R root:root /srv/zuul/git/integration/config
  • 10:13 hashar: CI in trouble due to a huge number of changes
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 Amir1: maintenance script is done
  • 09:35 moritzm: removing PHP 7.0 from deployment servers
  • 09:20 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T234774)
  • 09:18 moritzm: installing php7.0 security updates
  • 09:11 moritzm: installing subversion updates on Stretch (fixes compatibility with security fix for Apache update)
  • 09:07 moritzm: installing jackson-databind security updates
  • 09:01 moritzm: installing openjpeg2 security updates
  • 08:52 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/544209
  • 08:34 Urbanecm: Deploy security patch (T234862)
  • 08:34 vgutierrez: Switch from nginx to ats-tls on cp2022 - T231627
  • 08:30 ema: pool cp4029 with ATS backend T227432
  • 08:20 vgutierrez: Switch from nginx to ats-tls on cp2020 - T231627
  • 08:09 vgutierrez: Switch from nginx to ats-tls on cp2018 - T231627
  • 08:08 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 08:03 godog: swift codfw-prod: final weight to ms-be205[1-6] - T233638
  • 07:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:57 vgutierrez: Switch from nginx to ats-tls on cp3046 - T231627
  • 07:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:50 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4029.ulsfo.wmnet,service=ats-be
  • 07:45 moritzm: installing aspell security updates on jessie
  • 07:43 vgutierrez: Switch from nginx to ats-tls on cp3045 - T231627
  • 07:35 moritzm: installing openjdk-11 security updates
  • 07:32 ema: depool cp4029 and reimage as text_ats T227432
  • 07:15 vgutierrez: Switch from nginx to ats-tls on cp1075 - T231627
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool non partitioned db1089 into s1 special slaves to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9406 and previous config saved to /var/cache/conftool/dbconfig/20191021-070655-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9405 and previous config saved to /var/cache/conftool/dbconfig/20191021-070352-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9404 and previous config saved to /var/cache/conftool/dbconfig/20191021-070119-marostegui.json
  • 06:59 vgutierrez: Switch from nginx to ats-tls on cp2001 - T231627
  • 06:46 vgutierrez: Switch from nginx to ats-tls on cp3030 - T231627
  • 06:28 vgutierrez: Install python3-cryptography-2.6.1-3+deb10u2 on acme-chief hosts - T234131
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9403 and previous config saved to /var/cache/conftool/dbconfig/20191021-061518-marostegui.json
  • 06:12 vgutierrez: Switch cp1086 from nginx to ats-tls - T231433
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1130 on s5 to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9402 and previous config saved to /var/cache/conftool/dbconfig/20191021-055843-marostegui.json
  • 05:54 vgutierrez: Switch cp2017 from nginx to ats-tls - T231433
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9401 and previous config saved to /var/cache/conftool/dbconfig/20191021-055017-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2048 and db2061, those hosts will be decommissioned T228258', diff saved to https://phabricator.wikimedia.org/P9400 and previous config saved to /var/cache/conftool/dbconfig/20191021-054340-marostegui.json
  • 05:42 _joe_: slowly removing service objects from production etcd T233973
  • 05:38 vgutierrez: Switch cp3044 from nginx to ats-tls - T231433
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9399 and previous config saved to /var/cache/conftool/dbconfig/20191021-053737-marostegui.json
  • 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui: Compress tables on db2084:3314 db2091:3312 - T235599
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P9398 and previous config saved to /var/cache/conftool/dbconfig/20191021-052643-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312 db2084:3315 - T235599', diff saved to https://phabricator.wikimedia.org/P9397 and previous config saved to /var/cache/conftool/dbconfig/20191021-052527-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9396 and previous config saved to /var/cache/conftool/dbconfig/20191021-052035-marostegui.json
  • 05:19 vgutierrez: Switch cp4026 from nginx to ats-tls - T231433
  • 05:14 marostegui: Deploy schema change on db1090:3312 T234066 T233135
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312 for schema change and pool db1129 temporarily in vslow, dump', diff saved to https://phabricator.wikimedia.org/P9395 and previous config saved to /var/cache/conftool/dbconfig/20191021-051356-marostegui.json
  • 05:09 marostegui: Deploy schema change on s7 primary master db1062 - T234066 T233135
  • 04:57 vgutierrez: Switch cp5006 from nginx to ats-tls - T231433

2019-10-19

  • 08:41 XioNoX: add user papaul to fasw-c-eqiad
  • 00:05 mutante: LDAP - adding verenali to wmde and nda groups, to match raja_wmde (T233807, T231677)

2019-10-18

  • 22:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet,service=parsoid-php
  • 22:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet,service=parsoid-php
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet,service=parsoid-php
  • 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet,service=parsoid-php
  • 22:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet,service=parsoid-php
  • 22:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet,service=parsoid-php
  • 22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet,service=parsoid-php
  • 22:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet,service=parsoid-php
  • 22:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2032.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet,service=parsoid-php
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet,service=parsoid-php
  • 21:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet,service=parsoid-php
  • 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet,service=parsoid-php
  • 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet,service=parsoid-php
  • 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet,service=parsoid-php
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet,service=parsoid-php
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet,service=parsoid-php
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet,service=parsoid-php
  • 20:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet,service=parsoid-php
  • 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet,service=parsoid-php
  • 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet,service=parsoid-php
  • 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 19:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 18:27 mutante: temp. disabled puppet on all wtp* servers, adding mediawiki appserver roles on them incrementally by re-enabling puppet, starting with wtp1026, scheduled icinga downtime for wtp* all services (T233654)
  • 18:19 mutante: temp. disabling puppet on all wtp* servers
  • 15:40 Urbanecm: Reassign edits from DannyS712 (T235446) to DannyS712 at banwiki (T235446)
  • 15:38 Urbanecm: Run extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=banwiki DannyS712 (T235446)
  • 15:38 Urbanecm: Rename DannyS712@banwiki to DannyS712 (T235446) locally (T235446)
  • 15:07 Urbanecm: Reattach DannyS712@banwiki to DannyS712@SUL (T235446)
  • 14:19 _joe_: uploading cassandra 3.11.4 to stretch-wikimedia
  • 14:10 marostegui: Run compare.py on db1105 - T235877
  • 13:48 jynus: disabled notifications on db1105
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and db1105:3312 host rebooted itself', diff saved to https://phabricator.wikimedia.org/P9392 and previous config saved to /var/cache/conftool/dbconfig/20191018-134517-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2059 from config, host decommissioned', diff saved to https://phabricator.wikimedia.org/P9391 and previous config saved to /var/cache/conftool/dbconfig/20191018-132934-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3315 for tables compression T235599', diff saved to https://phabricator.wikimedia.org/P9390 and previous config saved to /var/cache/conftool/dbconfig/20191018-130253-marostegui.json
  • 13:01 marostegui: Compress db2084:3315 T235599
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P9389 and previous config saved to /var/cache/conftool/dbconfig/20191018-123930-marostegui.json
  • 12:20 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:10 jbond42: !log disable puppet on puppetmasters to fix puppet-merge
  • 11:58 moritzm: installing sudo security updates for jessie
  • 11:56 Reedy: `mwscript refreshLinks.php banwiki` on mwmaint1002 T235843
  • 11:10 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4028.ulsfo.wmnet,service=ats-be
  • 10:56 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet - T234175
  • 10:53 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet
  • 10:49 effie: Uploading wikidiff2_1.9.0-2~wmf1 to stretch-wikimedia T231586
  • 09:58 moritzm: rolling out debdeploy 0.0.99.12 fleet-wide
  • 09:57 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=echostore
  • 09:40 _joe_: restarting pybal on lvs1015 to pick up the addition of echostore
  • 09:37 ema: pool cp4028 with ATS backend T227432
  • 09:36 _joe_: restarting pybal on lvs2003 to pick up the addition of echostore
  • 09:34 _joe_: restarting pybal on lvs1016 to pick up the addition of echostore
  • 09:20 _joe_: restarting pybal on lvs2006 to pick up the addition of echostore
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=echostore
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 moritzm: importing debdeploy 0.0.99.12 to apt.wikimedia.org
  • 09:13 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:11 _joe_: hotpatching puppet-merge on puppetmaster1001
  • 08:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:32 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:03 ema: depool cp4028 and reimage as text_ats T227432
  • 07:58 marostegui: Deploy schema change on db1076
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P9388 and previous config saved to /var/cache/conftool/dbconfig/20191018-075709-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P9387 and previous config saved to /var/cache/conftool/dbconfig/20191018-075529-marostegui.json
  • 07:21 moritzm: installing unbound security updates on buster
  • 07:20 moritzm: installing libdatetime-timezone-perl updates (time zone updates)#
  • 05:53 vgutierrez: switch cp1084 from nginx to ats-tls - T231433
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:32 vgutierrez: switch cp2014 from nginx to ats-tls - T231433
  • 05:19 marostegui: Rename m5 labtestwiki database - T233236
  • 05:15 marostegui: Deploy schema change on db1129 T233135 T234066
  • 05:15 marostegui: Compress tables on db2091:3314 T235599
  • 05:14 vgutierrez: switch cp3039 from nginx to ats-tls - T231433
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P9386 and previous config saved to /var/cache/conftool/dbconfig/20191018-051355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 and db2086:3318 after table compression', diff saved to https://phabricator.wikimedia.org/P9385 and previous config saved to /var/cache/conftool/dbconfig/20191018-050831-marostegui.json
  • 04:57 vgutierrez: switch cp4025 from nginx to ats-tls - T231433
  • 04:34 vgutierrez: switch cp5005 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: restarting nagios-nrpe-server on stat1007

2019-10-17

  • 21:42 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673 (duration: 05m 38s)
  • 21:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673
  • 19:31 eileen: civicrm revision changed from 4eac801762 to ff69d64ad4, config revision is dc3a88889d
  • 18:26 mutante: wtp1025 - cd /srv/deployment/parsoid/deploy/src ; sudo -u deploy-service ln -s ../vendor (for benchmarking test)
  • 18:01 _joe_: depooled wtp1025 from parsoid, parsoid-php to allow running benchmarks there
  • 18:01 elukey: update librdkafka on eventlog1002 and restart eventlogging
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 and remove db1136 from its temporary vslow,dump role', diff saved to https://phabricator.wikimedia.org/P9382 and previous config saved to /var/cache/conftool/dbconfig/20191017-151952-marostegui.json
  • 15:07 dcausse: unbanning elastic1050:psi
  • 15:01 dcausse: dumping jvm heap on elastic1050:psi to investigate gc issues
  • 14:46 moritzm: installing 4.9.189 Linux update on jessie hosts (no reboots, deploying the package only at this point)
  • 14:37 dcausse: banning elastic1050:psi to investigate gc issues
  • 14:32 moritzm: uploaded linux-meta 1.22 for jessie-wikimedia
  • 14:32 bblack: disable puppet on cache fleet (cp*) ahead of cert deployment refactoring - T234803
  • 14:09 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro --restrict grafana update buster-wikimedia
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9381 and previous config saved to /var/cache/conftool/dbconfig/20191017-134112-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9380 and previous config saved to /var/cache/conftool/dbconfig/20191017-133047-marostegui.json
  • 13:06 XioNoX: rollback failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 12:56 XioNoX: restart mr1-eqiad
  • 12:54 XioNoX: downtiming all mgmt host for 30min (mr1-eqiad needs to be rebooted)
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9379 and previous config saved to /var/cache/conftool/dbconfig/20191017-125248-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9378 and previous config saved to /var/cache/conftool/dbconfig/20191017-125154-marostegui.json
  • 12:50 marostegui: Compress tables on db2088:3312 - T235599
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9377 and previous config saved to /var/cache/conftool/dbconfig/20191017-124503-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1090:3312 original weight', diff saved to https://phabricator.wikimedia.org/P9376 and previous config saved to /var/cache/conftool/dbconfig/20191017-121330-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9375 and previous config saved to /var/cache/conftool/dbconfig/20191017-121106-marostegui.json
  • 11:39 ema: pool cp4027 with ATS backend T227432
  • 11:36 vgutierrez: upgrading ATS on eqiad nodes to 8.0.5-1wm9 - T234011
  • 11:27 vgutierrez: upgrading ATS on codfw nodes to 8.0.5-1wm9 - T234011
  • 11:27 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4027.ulsfo.wmnet,service=ats-be
  • 11:16 vgutierrez: upgrading ATS on esams nodes to 8.0.5-1wm9 - T234011
  • 11:11 Urbanecm: EU SWAT done
  • 11:11 XioNoX: failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 36d4612: Allow sysops to add transwiki on nnwiki, and add import sources (T231761) (duration: 00m 59s)
  • 11:09 vgutierrez: upgrading ATS on ulsfo nodes to 8.0.5-1wm9 - T234011
  • 11:08 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikibaseMediaInfo: SWAT: 5a67011: Keep track of assigned nodes in both old & new DOM (T235236) (duration: 01m 03s)
  • 10:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 ema: depool cp4027 and reimage as text_ats T227432
  • 10:31 effie: depool mw1333
  • 10:25 elukey: rollback eventlogging back to Python 2, some errors (unseen in tests) logged by the processors
  • 10:24 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3 (duration: 00m 03s)
  • 10:24 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3
  • 10:19 elukey: Move eventlogging on eventlog1002 to Python3
  • 10:17 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3 (duration: 00m 05s)
  • 10:17 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3
  • 09:57 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 09:39 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:38 marostegui: Stop MySQL on db1129 for PDU work
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for PDU work, give some traffic to db1090:3312 meanwhile T22meanwhile T227133', diff saved to https://phabricator.wikimedia.org/P9374 and previous config saved to /var/cache/conftool/dbconfig/20191017-093753-marostegui.json
  • 09:27 elukey: upload archiva 2.2.4-1 to stretch-wikimedia - T222595
  • 09:26 marostegui: Stop MySQL on db1117 this will generate some haproxy alerts - T227133
  • 08:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:26 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:05 vgutierrez: upgrading ATS on eqsin nodes to 8.0.5-1wm9 - T234011
  • 08:03 marostegui: Deploy schema change on db1090:3317
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db1136 weight', diff saved to https://phabricator.wikimedia.org/P9373 and previous config saved to /var/cache/conftool/dbconfig/20191017-080157-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 pool db1136 temporarily into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9372 and previous config saved to /var/cache/conftool/dbconfig/20191017-080026-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P9371 and previous config saved to /var/cache/conftool/dbconfig/20191017-074658-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 (non partitioned host) into s5 special group with low weight - T223151', diff saved to https://phabricator.wikimedia.org/P9370 and previous config saved to /var/cache/conftool/dbconfig/20191017-071308-marostegui.json
  • 06:06 elukey: upgrade archiva on archiva1001 to 2.2.4 - T222595
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from x to x100 on s5 - T231018', diff saved to https://phabricator.wikimedia.org/P9369 and previous config saved to /var/cache/conftool/dbconfig/20191017-060251-marostegui.json
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui: Deploy schema change on labtestwiki and labswiki
  • 05:12 marostegui: Deploy schema change on db1095:3312
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 and db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P9368 and previous config saved to /var/cache/conftool/dbconfig/20191017-051055-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 and db1094', diff saved to https://phabricator.wikimedia.org/P9367 and previous config saved to /var/cache/conftool/dbconfig/20191017-050614-marostegui.json
  • 05:01 vgutierrez: upgrading ATS to 8.0.5-1wm9 on cp5001 - T234011
  • 05:00 vgutierrez: uploaded trafficserver 8.0.5-1wm9 to apt.wikimedia.org (stretch) - T234011
  • 02:04 bblack: repooling eqsin
  • 00:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2019-10-16

  • 23:17 Urbanecm: Evening SWAT done
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: Clean expired rules (duration: 00m 58s)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-1.5x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-2x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki.png (T235710)
  • 23:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 9c5bcd8: Change logo for azwiki (T235710) (duration: 00m 59s)
  • 23:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6dc4c0c: New throttle rule for WMCL editathon (T235693) (duration: 00m 59s)
  • 23:09 @: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96c87c7: Enable transwiki import from other Wikipedias on srwikisource (T235419) (duration: 00m 58s)
  • 23:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:00 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 22:42 James_F: Zuul: Add composer-php72-docker for wikimedia-cz/web-theme and wikimedia-cz/web-plugin
  • 22:31 mutante: mwmaint1002 - running generate-fancy-captcha-loop to work around issue with generate-captcha cron (T230245)
  • 22:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:29 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/OutputPage.php: T235711 Lower severity of targets violation back to DEBUG (duration: 00m 59s)
  • 21:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikiEditor: T235701 Revert removal of jquery.tabIndex (duration: 00m 59s)
  • 21:47 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:44 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:42 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:41 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 21:10 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 20:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 20:41 ejegg: rolled back fundraising python tools from 31171f148c to b3c7453be2
  • 20:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/resourceloader/ResourceLoaderStartUpModule.php: Expose StartupModule::getConfigSettings for internal use T235350 T229836 (duration: 00m 59s)
  • 20:07 joal@deploy1001: Finished deploy [analytics/refinery@1704fdd]: Regular analytics weekly train (duration: 17m 06s)
  • 20:00 urandom: upgrading Cassandra to 3.11.4, codfw, rack d -- T200803
  • 19:50 joal@deploy1001: Started deploy [analytics/refinery@1704fdd]: Regular analytics weekly train
  • 19:35 urandom: upgrading Cassandra to 3.11.4, codfw, rack c -- T200803
  • 19:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.34.0-wmf.25 (duration: 03m 24s)
  • 19:18 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix (duration: 05m 53s)
  • 19:13 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix
  • 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.2 refs T233850 (duration: 00m 59s)
  • 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.2 refs T233850
  • 19:06 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint) (duration: 01m 18s)
  • 19:05 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint)
  • 18:46 urandom: upgrading Cassandra to 3.11.4, codfw, rack b -- T200803
  • 18:28 urandom: upgrading Cassandra to 3.11.4, eqiad, rack d -- T200803
  • 18:06 urandom: upgrading Cassandra to 3.11.4, eqiad, rack b -- T200803
  • 16:33 urandom: upgrading Cassandra to 3.11.4, eqiad, rack a -- T200803
  • 16:17 catrope@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/GrowthExperiments/: Fix help panel button alignment (T235578) (duration: 01m 02s)
  • 16:16 mutante: ganeti1003 - shutting down and removing instance moscovium.eqiad.wmnet - recreating under same name with cookbook
  • 15:59 mutante: new dsh group parsoid_php created - parsoid-php servers added to scap / mediawiki-installation dsh group
  • 15:17 marostegui: Deploy schema change on dbstore1004:3312 - T234066 T233135
  • 15:09 marostegui: Recreate views for protected_titles on s2 and s7 on labsdb1009 and labsdb1012 - T233135
  • 15:04 mutante: wtp1025 wtp2001 - scap pull (T233654)
  • 15:04 mutante: wtp parsoid servers added to conftool - wtp1025 and wtp2001 pooled in new service parsoid-php (T233654)
  • 15:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 14:53 effie: Remove tex* and math related packages from deploy*,mwmaint*,snapshot* - T195847
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:26 papaul: power down puppetmaster2001 for HW maintenance
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:24 _joe_: creating namespaces and policies for echostore in codfw, T234376
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:10 moritzm: installing idp2001
  • 13:56 jynus: reenabling puppet on helium T229209
  • 13:46 XioNoX: rollback failover VRRP from cr1-eqiad to cr2-eqiad - T226782
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 and db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P9364 and previous config saved to /var/cache/conftool/dbconfig/20191016-132620-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P9363 and previous config saved to /var/cache/conftool/dbconfig/20191016-131010-marostegui.json
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P9362 and previous config saved to /var/cache/conftool/dbconfig/20191016-125102-marostegui.json
  • 12:38 effie: remove tex* and math related packages from appserver canaries - T195847
  • 12:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540 (duration: 03m 40s)
  • 12:29 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:26 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540
  • 12:20 marostegui: Compress tables on db1099:3311 - T235599
  • 12:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c90503b]: Revert to fix T235540 (duration: 19m 09s)
  • 12:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:00 kart_: Updated cxserver to 2019-10-15-091114-production (T234773, T217585)
  • 11:57 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:56 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c90503b]: Revert to fix T235540
  • 11:49 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT (duration: 10m 13s)
  • 11:46 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:39 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT
  • 11:34 Lucas_WMDE: EU SWAT done
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: extension-list: Load FlaggedRevs via extension.json (T87915, T139800, T140852) (duration: 01m 05s)
  • 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure Citoid+Wikibase integration on Test Wikidata (T228412) (duration: 01m 13s)
  • 11:14 _joe_: purging confd from wtp* servers, not needed anymore
  • 10:48 _joe_: upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run
  • 10:31 elukey: upload prometheus-memcached-exporter 0.4.1+git20181010.2fa99eb-1+deb10u1 to buster-wikimedia - T213089
  • 10:17 marostegui: Stop replication on s2 codfw master for schema change and to modify sanitarium triggers T234066 T233135 T234704
  • 09:40 effie: enable puppet on all hosts running hhvm - T229792
  • 09:36 XioNoX: restart fastnetmon on netflow2001
  • 09:27 effie: Disable puppet on all hosts running hhvm to merge 543131 - T229792
  • 09:22 effie: Disable puppet on mw* hosts to merge 543131
  • 09:20 gehel: force merging commonswiki_content on elasticsearch codfw
  • 08:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:15 _joe_: upgrading envoyproxy in production to 1.11.2 T235412
  • 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9360 and previous config saved to /var/cache/conftool/dbconfig/20191016-052104-marostegui.json
  • 05:18 marostegui: Deploy schema change on s2 sanitarium master (db1074) this will create lag on s2 labsdb T233135 T234066
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P9359 and previous config saved to /var/cache/conftool/dbconfig/20191016-051812-marostegui.json
  • 05:14 marostegui: Change s7 triggers for archive table from db1125:3317 T234704
  • 05:11 marostegui: Change s2 triggers for archive table from db1125:3312 T234704
  • 05:08 marostegui: Deploy schema change on s7 sanitarium master (db1079) this will create lag on s7 labsdb T233135 T234066
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P9358 and previous config saved to /var/cache/conftool/dbconfig/20191016-050627-marostegui.json
  • 03:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465 (duration: 13m 37s)
  • 03:35 mobrovac@deploy1001: Started deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465
  • 01:55 eileen: civicrm revision changed from 5a2f8048c4 to 4eac801762, config revision is dc3a88889d
  • 00:09 mutante: wikitech - make JBond a "content administrator" to give the ability to create server fingerprint pages

2019-10-15

  • 22:41 Reedy: manually running `extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php` T230245
  • 21:26 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Provide getCachableMWConfig() which doesn't rely on wgConf (duration: 01m 00s)
  • 21:24 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408) (duration: 05m 35s)
  • 21:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408)
  • 21:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings: Stop writing wmgScoreFileBackend and wmgScorePath, never read (duration: 00m 59s)
  • 21:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Stop using wmg variables for Score extension (duration: 01m 01s)
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write wgScoreFileBackend and wgScorePath directly, not via CommonSettings (duration: 01m 00s)
  • 20:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.2 refs T233850
  • {{safesubst:SAL entry|1=19:55 urandom: upgrade restbase2011-{a,b,c} to cassandra 3.11.-4 -- T200803}}
  • 19:52 urandom: upgrade restbase1016-c to cassandra 3.11.-4 -- T200803
  • 19:48 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.2 refs T233850 (duration: 27m 39s)
  • 19:48 urandom: upgrade restbase1016-b to cassandra 3.11.-4 -- T200803
  • 19:42 urandom: upgrade restbase1016-a to cassandra 3.11.-4 -- T200803
  • 19:20 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.2 refs T233850
  • 19:07 mutante: LDAP - adding user rzl to groups wmf and ops (T235215)
  • 17:51 longma: cutting the branch for 1.35.0-wmf.2 T233850
  • 16:28 ejegg: updated payments-wiki from c3cc3ace2f to 570324a30f
  • 16:24 papaul: power down lvs2010 for HW maintenance
  • 16:00 _joe_: uploading envoy 1.11.2 to stretch-wikimedia, buster-wikimedia T230779 T235412
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9355 and previous config saved to /var/cache/conftool/dbconfig/20191015-155454-marostegui.json
  • 15:52 papaul: power down lvs2009 for HW maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9354 and previous config saved to /var/cache/conftool/dbconfig/20191015-154325-marostegui.json
  • 15:17 ejegg: updated payments-wiki from 8a65f57874 to c3cc3ace2f
  • 15:01 moritzm: installing fribidi bugfix updates from stretch point release
  • 14:54 moritzm: installing cups security updates for stretch (client-side libs/tools only)
  • 14:43 elukey: start a root tmux containing a bash script on conf1004 to clean up znodes under /yarn-rmstore/analytics-hadoop/ZKRMStateRoot/RMAppRoot slowly - T217057
  • 14:40 papaul: power down puppetmaster2002 for HW maintenance
  • 14:38 moritzm: installing usbutils update from stretch point release
  • 14:34 elukey: executed 'rmr' in zookeeper on conf1004 for znodes /yarn-leader-election /hadoop-ha /hive_zookeeper_namespace
  • 14:12 ejegg: updated fundraising python tools from b3c7453be2 to 31171f148c
  • 13:53 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9353 and previous config saved to /var/cache/conftool/dbconfig/20191015-130356-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9352 and previous config saved to /var/cache/conftool/dbconfig/20191015-124942-marostegui.json
  • 12:46 elukey: Hadoop maintenance over
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9351 and previous config saved to /var/cache/conftool/dbconfig/20191015-123356-marostegui.json
  • 12:24 mobrovac: restbase add parsoidphp tables in prod - T230792
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9350 and previous config saved to /var/cache/conftool/dbconfig/20191015-121840-marostegui.json
  • 12:17 marostegui: Repool labsdb1009 after PDU maintenance
  • 12:17 elukey: Hadoop maintenance start - migration to the new Zookepeer cluster
  • 12:16 moritzm: installing sudo security updates on buster/stretch
  • 12:13 arturo: add copy of python-pykube and python3-pykube from stretch-wikimedia to buster-wikimedia (T230961)
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 hashar: CI Jenkins restarted
  • 12:04 hashar: Restarting CI Jenkins
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9348 and previous config saved to /var/cache/conftool/dbconfig/20191015-120359-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P9347 and previous config saved to /var/cache/conftool/dbconfig/20191015-120133-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9346 and previous config saved to /var/cache/conftool/dbconfig/20191015-115922-marostegui.json
  • 11:12 Urbanecm: EU SWAT done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ac37540: Add `autopatrol` to translation administrators on mediawiki (duration: 00m 51s)
  • 11:12 jbond42: move puppetmaster_ca_server back to puppetmaster1001
  • 11:08 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip 195.113.145.2 (T235493)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT:855aca4eb: Throttle rule for Czech course (T235493) (duration: 00m 51s)
  • 10:54 moritzm: mark ruby-safe-yaml as manually installed using apt-mark on jessie/stretch, prevents accidental removal of ruby-safe-yaml after puppet 4->5 migration
  • 10:07 moritzm: installing openssl updates for buster (some ciphers we don't use were not enabled due to an upstream change related to the selection of ASM-optimised implementations over generic C)
  • 08:07 marostegui: Stop MySQL on db1126 and labsdb1009 for PDU maintenance - T226782
  • 08:06 elukey: upload new version of memkeys (adding a patch to merged to upstream to avoid segfaults on stretch/buster) to stretch|buster wikimedia apt repos - T223863
  • 07:52 Urbanecm: Set email for `Martin Urbanec (test 10)` to test@wikimedia.cz (debug, no ticket)
  • 07:48 Urbanecm: Password reset for Xaris333 #2 (T235441)
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for PDU maintenance T226782', diff saved to https://phabricator.wikimedia.org/P9345 and previous config saved to /var/cache/conftool/dbconfig/20191015-071338-marostegui.json
  • 07:10 XioNoX: failover VRRP from cr1-eqiad to cr2-eqiad in prevision of the PDU work of - T226782
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 T232446', diff saved to https://phabricator.wikimedia.org/P9344 and previous config saved to /var/cache/conftool/dbconfig/20191015-064419-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1070 T235464', diff saved to https://phabricator.wikimedia.org/P9343 and previous config saved to /var/cache/conftool/dbconfig/20191015-064005-marostegui.json
  • 05:38 marostegui: Depool labsdb1009 for PDU maintenance T226782
  • 05:28 marostegui: Deploy schema change on db1098:3317 T234066 T233135
  • 05:28 marostegui: Deploy schema change on db1097:3314 T233625
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9342 and previous config saved to /var/cache/conftool/dbconfig/20191015-052621-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9341 and previous config saved to /var/cache/conftool/dbconfig/20191015-052220-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P9340 and previous config saved to /var/cache/conftool/dbconfig/20191015-051924-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P9339 and previous config saved to /var/cache/conftool/dbconfig/20191015-051400-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P9338 and previous config saved to /var/cache/conftool/dbconfig/20191015-051236-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1100 to s5 master and remove read-only from s5 T234300', diff saved to https://phabricator.wikimedia.org/P9337 and previous config saved to /var/cache/conftool/dbconfig/20191015-050042-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s5 as read-only for maintenance T234300', diff saved to https://phabricator.wikimedia.org/P9336 and previous config saved to /var/cache/conftool/dbconfig/20191015-050016-marostegui.json
  • 05:00 marostegui: Starting s5 failover from db1070 to db1100 - T234300
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P9335 and previous config saved to /var/cache/conftool/dbconfig/20191015-043403-marostegui.json
  • 04:15 marostegui: Start pre-switchover steps T234300

2019-10-14

  • 23:27 Krinkle: Delete 2019-09-01––2019-09-10 arclamp trace logs from webperf1002, and decompress the rest of 2019-09 (this will trigger svg re-generation), T235425
  • 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 86f12b6e (duration: 00m 51s)
  • 21:47 Krinkle: Deleting 2019-09-01––2019-09-10 arclamp logs on webperf2002, and decompress the rest of 2019-09, T235425
  • 21:12 Krinkle: Delete misc arclamp/logs and arclamp/svgs data from between 2018 and and 2019-08 on webperf1002/webperf2002, T235425
  • 20:41 maxsem@deploy1001: Synchronized php-1.35.0-wmf.1/includes/: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542963/ (duration: 00m 55s)
  • 17:56 mutante: webperf2002 - /srv/xenon/logs/daily# gzip 2019-09*excimer*.log (T235425)
  • 17:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates (duration: 16m 45s)
  • 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates
  • 16:07 moritzm: imported cergen 0.2.4-1+deb10u3 to component/cergen for buster-wikimedia T235405
  • 16:00 Urbanecm: Password reset for Xaris333 (T235441)
  • 15:57 moritzm: imported cergen 0.2.4-1+deb10u2 to component/cergen for buster-wikimedia T235405
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9329 and previous config saved to /var/cache/conftool/dbconfig/20191014-142843-marostegui.json
  • 14:28 elukey: upload matomo 3.11 to stretch-wikimedia and upgrade matomo1001 - T234607
  • 14:21 marostegui: Deploy schema change on db1116:3317 T234066 T233135
  • 14:13 effie: Enable puppet on mw* servers and reload apache - T229792
  • 13:48 moritzm: imported cergen 0.2.4-1+deb10u1 to component/cergen for buster-wikimedia T235405
  • 13:42 marostegui: Repool labsdb1009 after PSU replacement - T233273
  • 13:36 effie: Slowly enable puppet on mw* canaries
  • 13:26 moritzm: imported python-networkx 1.11-2~wmf1 to component/cergen for buster-wikimedia T235405
  • 13:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:19 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:18 effie: Disable puppet on mw* to remove php72_only feature flag - T229792
  • 13:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 245b4e5: Add banwiki logo to IS.php (T234768) (duration: 00m 51s)
  • 13:12 Urbanecm: Run git reset --hard origin/master in /srv/mediawiki-stagging (deleted https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542920 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542919 from deployment srv, both don't actually change anything => safe to delete) (T234768)
  • 13:10 marostegui: Sanitize banwiki on db1124:3313 and db2094:3313 T234770
  • 12:44 Amir1: Creating banwiki is banned (done)
  • 12:40 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
  • 12:34 ladsgroup@deploy1001: Synchronized langlist: Creating banwiki: T234768 (duration: 00m 50s)
  • 12:32 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating banwiki: T234768
  • 12:20 ladsgroup@deploy1001: Synchronized dblists: Creating banwiki: T234768 (duration: 00m 52s)
  • 12:10 tarrow@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/Wikibase: SWAT: Bump up Termbox cache version (T235192) (duration: 00m 56s)
  • 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reftabs on testwikidata (T199197, T228412) (duration: 00m 51s)
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a295cc7: Fix wrong domain in wgCopyUploadDomains added in T203363 (T235415) (duration: 00m 51s)
  • 11:27 kart_: Update cxserver to 2019-10-03-054958-production (T232986)
  • 11:22 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:17 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 538867|Use ContentTranslationEnableMT to disable MT (T232986) (duration: 00m 51s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9326 and previous config saved to /var/cache/conftool/dbconfig/20191014-100758-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 into s5 api, db1100 will be removed later in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9325 and previous config saved to /var/cache/conftool/dbconfig/20191014-094809-marostegui.json
  • 09:34 hashar: Upgraded CI jobs to Quibble 0.0.38
  • 09:14 marostegui: Deploy schema change on dbstore1003:3317
  • 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:55 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:52 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 and db2126 after changing sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9322 and previous config saved to /var/cache/conftool/dbconfig/20191014-085143-marostegui.json
  • 08:46 mobrovac: restbase drop metadata keyspaces from cassandra - T235173
  • 07:54 marostegui: Stop db1074 and db2126 in sync to change sanitarium's master for s2 - T231638
  • 07:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata (duration: 03m 58s)
  • 07:45 mobrovac@deploy1001: Started deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata
  • 07:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173 (duration: 13m 37s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db2126 to change sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9320 and previous config saved to /var/cache/conftool/dbconfig/20191014-073319-marostegui.json
  • 07:28 mobrovac@deploy1001: Started deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173
  • 07:28 mobrovac@deploy1001: Finished deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173 (duration: 01m 25s)
  • 07:26 mobrovac@deploy1001: Started deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json
  • 07:16 marostegui: Stop MySQL on labsdb1009 for on-site maintenance - T233273
  • 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)
  • 05:47 marostegui: Remove db2068 from tendril and zarcillo T235399
  • 04:56 marostegui: Depool labsdb1009 for on-site maintenance - T233273
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9318 and previous config saved to /var/cache/conftool/dbconfig/20191014-045629-marostegui.json

2019-10-13

  • 00:52 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ec77b1b (duration: 00m 55s)

2019-10-12

  • 23:21 krinkle@deploy1001: Synchronized wmf-config/profiler.php: bfa8bb69c1f, T231564 (duration: 00m 51s)
  • 21:07 krinkle@deploy1001: Synchronized php-1.35.0-wmf.1/includes/resourceloader/ResourceLoaderStartUpModule.php: 8c6baeae2 (duration: 00m 53s)
  • 20:57 Urbanecm: Reset user email of User:Gardini (T235318)
  • 18:38 _joe_: deleting zotero pods with excessive memory usage in eqiad
  • 16:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: T235334 (duration: 00m 51s)
  • 16:15 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBacklinksprop.php: T235334 (duration: 00m 56s)
  • 04:37 krinkle@deploy1001: Synchronized wmf-config/profiler.php: 29d8469 (duration: 00m 57s)

2019-10-11

  • 15:39 AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de
  • 13:57 moritzm: rebooting cloudbackup2001
  • 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - T235150
  • 12:47 XioNoX: disable SIP ALG on pfw3-codfw - T235150
  • 12:45 moritzm: installing libxslt security updates
  • 12:35 moritzm: installin zsh updates from stretch point release
  • 12:33 moritzm: installing gsoap security updates on stretch
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
  • 12:31 moritzm: installing libcaca security updates on stretch
  • 12:25 XioNoX: push firewall policies to pfw3-eqiad - T235074
  • 12:24 XioNoX: push firewall policies to pfw3-codfw - T235074
  • 11:51 moritzm: installing unzip security updates on stretch
  • 11:08 moritzm: upgrading debdeploy to 0.0.99.11
  • 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
  • 10:11 hashar: Restarting Gerrit # T224448
  • 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
  • 08:34 moritzm: remove kafka2001-2003 from debmonitor DB (T235125)
  • 08:32 moritzm: remove kafka1001-1003 from debmonitor DB (T235125)
  • 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
  • 07:32 XioNoX: rollback two previous HE peering deactivate
  • 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
  • 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
  • 06:13 marostegui: Compress tables on db2085:3318 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
  • 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
  • 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
  • 02:14 mutante: gerrit - "manually" starting replication via ssh command
  • 02:13 mutante: gerrit - restart service to ensure last config change is picked up
  • 02:10 mutante: gerrit1001 - attempt to manually start replication to github

2019-10-10

  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Update cron-updated miser pages to say they are run periodically, not never (duration: 00m 51s)
  • 22:10 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Remove debug line dating from 2015-12-08! (duration: 00m 51s)
  • 22:04 jforrester@deploy1001: Synchronized wmf-config/mc.php: Drop nutcracker indirection for HHVM servers, just point to localhost (duration: 00m 51s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Drop special-case for PHP7, now always used (duration: 00m 51s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop HHVM special-case for SVG converter, no longer used (duration: 00m 51s)
  • 21:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't check to shard static config cache for HHVM any more (duration: 00m 50s)
  • 21:48 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Don't check to shard wmgWBSharedCacheKey for HHVM any more (duration: 00m 51s)
  • 21:39 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/lib/ve/src/dm/ve.dm.TreeCursor.js: T234881 TreeCursor: cross ignored nodes properly from the end of a text node (duration: 00m 54s)
  • 20:36 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004 (duration: 00m 06s)
  • 20:36 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004
  • 20:13 hoo: Updated the Wikidata property suggester with data from the 2019-09-30 JSON dump and applied the T132839 workarounds
  • 19:33 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 19:29 marxarelli: promoted 1.35.0-wmf.1 to all wikis. no rise in errors rates. no new relevant errors cc: T233849
  • 19:25 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.1
  • 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1
  • 19:09 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s)
  • 19:04 marxarelli: promoting labswiki to 1.35.0-wmf.1 cc: T233849
  • 17:07 jbond42: puppetmaster1001 has been upgraded and is back serving requests
  • 16:21 urandom: Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803
  • 16:18 urandom: Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:16 urandom: Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:11 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:07 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:04 thcipriani: restarting gerrit due to T224448
  • 16:04 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:01 urandom: Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 15:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s)
  • 15:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json
  • 14:54 moritzm: ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage)
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json
  • 14:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 14:03 jbond42: re-enable puppet now ca has been correctly moved
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json
  • 13:50 jbond42: disable puppet fleet wide as puppetmaster2002 is stuggeling
  • 13:32 jbond42: reimage puppetmaster1001
  • 13:27 marostegui: Repool labsdb1011 after reclone - T235016
  • 13:16 arturo: added flannel 0.5.5-4 to buster-wikimedia (T235059)
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s)
  • 13:00 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 12:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 11:57 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:57 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:48 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:46 jbond@cumin2001: Updating IPMI password on 35 hosts - jbond@cumin2001
  • 11:46 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Fix typo in beta repo data bridge config (T235033) (duration: 00m 59s)
  • 11:40 marostegui: Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135
  • 11:38 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:38 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:38 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:37 arturo: icinga downtime cloudvirt1023 for 2h (T227536)
  • 11:36 arturo: icinga downtime cloudvirt1025 for 2h (T227536)
  • 11:36 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:36 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:36 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:35 arturo: icinga downtime cloudvirt1026 for 2h (T227536)
  • 11:35 marostegui: Stop replication on db2077 to change triggers on db2095:3317 - T234704
  • 11:23 moritzm: installing reportbug updates from stretch point release
  • 11:22 Lucas_WMDE: EU SWAT done
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Set dataBridgeEnabled repo setting on beta (T235033) (affects InitialiseSettings-labs.php and Wikibase.php, but Wikibase.php part is guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:14 Lucas_WMDE: ^ (and by CS, I actually mean Wikibase.php, not CommonSettings.php, sorry)
  • 11:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Rename data bridge config variable names (T235033) (affects IS-labs and CS, but the CS part is all guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 10:38 moritzm: rebalancing Ganeti eqiad/row C after rolling reboots of Ganeti nodes
  • 10:34 volans: uploaded spicerack_0.0.28-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 08:23 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:12 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP - T233654 (duration: 01m 01s)
  • 07:55 marostegui: Stop MySQL on es1014 es1013 db1084 db1083 db1077 db1076 db1112 db1124 db1118 for on-site PDU maintenance (this will generate lag on labsdb hosts) - T227536
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:56 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Drop designate_pool_manager database from m5 - T233978
  • 06:33 marostegui: Revoke privileges from designate user on the designate_pool_manager database - T233978
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for PDU maintenance T227536', diff saved to https://phabricator.wikimedia.org/P9294 and previous config saved to /var/cache/conftool/dbconfig/20191010-055153-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1078 into rc service for s3 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9293 and previous config saved to /var/cache/conftool/dbconfig/20191010-055102-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 db1083 db1076 db1118 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9292 and previous config saved to /var/cache/conftool/dbconfig/20191010-054853-marostegui.json
  • 05:47 marostegui: Depool db1084 db1083 db1076 db1118 for PDU maintenance - T227536
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 marostegui: Deploy schema change on db1061 (s6 eqiad master) - T233135 T234066
  • 04:43 marostegui: Depool labsdb1011 for recloning - T235016
  • 00:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 00:39 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 00:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 00:38 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset

2019-10-09

  • 23:55 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 03m 57s)
  • 23:51 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: (no justification provided)
  • 23:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable AMC on all wikis (T233612) (duration: 00m 58s)
  • 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Turn on AMC outreach modal (T234026) (duration: 00m 59s)
  • 22:01 mutante: restarting gerrit to revert replication config change (T235135)
  • 21:27 godog: swift eqiad-prod: add ms-be105[1-6] - T232367
  • 21:02 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: (no justification provided) (duration: 00m 02s)
  • 21:02 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 21:02 otto@deploy1001: deploy aborted: (no justification provided) (duration: 38m 29s)
  • 20:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006 (duration: 01m 44s)
  • 20:53 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006
  • 20:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds (duration: 02m 42s)
  • 20:41 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds
  • 20:31 papaul: rebooting ms-be1051 to access BIOS
  • 20:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e (duration: 06m 22s)
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 20:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 00m 10s)
  • 20:16 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 05m 34s)
  • 20:10 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:09 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 02m 23s)
  • 20:06 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:56 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 00m 12s)
  • 19:54 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:54 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:52 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 08m 00s)
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:44 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 09m 33s)
  • 19:34 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:25 marxarelli: 1.35.0-wmf.1 promoted to group1, labswiki rolled back to 1.34.0-wmf.25 and to be kept back, cc: T233849
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki rollback to 1.34.0-wmf.25 due to hhvm
  • {{safesubst:SAL entry|1=19:09 urandom: Upgrade restbase-dev1006-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 19:09 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.1 (duration: 00m 58s)
  • 19:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.1
  • {{safesubst:SAL entry|1=18:51 urandom: Upgrade restbase-dev1005-{a,b} to Cassandra 3.11.4 -- T200803}}
  • {{safesubst:SAL entry|1=18:45 urandom: Upgrade restbase-dev1004-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 18:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:43 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid config changes
  • 17:19 eileen: civicrm revision changed from 2ba100486e to 5a2f8048c4, config revision is 5560cc0878
  • 16:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:48 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9289 and previous config saved to /var/cache/conftool/dbconfig/20191009-160506-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9288 and previous config saved to /var/cache/conftool/dbconfig/20191009-153705-marostegui.json
  • 15:04 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:02 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1085 vslow and dump group', diff saved to https://phabricator.wikimedia.org/P9287 and previous config saved to /var/cache/conftool/dbconfig/20191009-145102-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9286 and previous config saved to /var/cache/conftool/dbconfig/20191009-144928-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9285 and previous config saved to /var/cache/conftool/dbconfig/20191009-144607-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'More trafic to db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9284 and previous config saved to /var/cache/conftool/dbconfig/20191009-144400-marostegui.json
  • 14:38 elukey: cr1-eqsin: change IPv6 address for BGP peer AS4761
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9283 and previous config saved to /var/cache/conftool/dbconfig/20191009-141137-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9282 and previous config saved to /var/cache/conftool/dbconfig/20191009-140749-marostegui.json
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: rebalancing Ganeti eqiad/row A after rolling reboots of Ganeti nodes
  • 13:48 jbond42: reimage puppetmaster2001
  • 13:37 vgutierrez: repooling cp1085 - T231525
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1075', diff saved to https://phabricator.wikimedia.org/P9280 and previous config saved to /var/cache/conftool/dbconfig/20191009-133709-marostegui.json
  • 13:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928 (duration: 14m 26s)
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9279 and previous config saved to /var/cache/conftool/dbconfig/20191009-125641-marostegui.json
  • 12:42 marostegui: Stop MySQL and power off db1074 for BBU replacement T231638
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9278 and previous config saved to /var/cache/conftool/dbconfig/20191009-124218-marostegui.json
  • 12:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2 (duration: 08m 18s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9277 and previous config saved to /var/cache/conftool/dbconfig/20191009-124035-marostegui.json
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 moritzm: disabled puppet on DNS recursors for staged rollout of ferm NTP change
  • 12:35 jbond42: reimage puppetmaster2002
  • 12:32 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2
  • 12:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928 (duration: 09m 40s)
  • 12:28 vgutierrez: depooling cp1085 for a power drain - T231525
  • 12:20 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928
  • 12:13 moritzm: draining ganeti1001 for upcoming reboot (combined kernel/qemu security updates)
  • 12:10 moritzm: failover Ganeti master in eqiad to ganeti1003
  • 12:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:32 moritzm: draining ganeti1008 for upcoming reboot (combined kernel/qemu security updates)
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 Amir1: EU SWAT is done
  • 11:04 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put write both limit down to Q70m for item terms (T234948) (duration: 01m 10s)
  • 11:04 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:58 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:18 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:44 moritzm: draining ganeti1007 for upcoming reboot (combined kernel/qemu security updates)
  • 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:59 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change, temporarily pool db1085 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9276 and previous config saved to /var/cache/conftool/dbconfig/20191009-085016-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P9275 and previous config saved to /var/cache/conftool/dbconfig/20191009-084732-marostegui.json
  • 08:39 vgutierrez: Switch cp1082 from nginx to ats-tls - T231433
  • 08:24 moritzm: draining ganeti1006 for upcoming reboot (combined kernel/qemu security updates)
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: Switch cp2011 from nginx to ats-tls - T231433
  • 07:48 moritzm: reduced RAM assignment for boron to 8G
  • 07:38 vgutierrez: Switch cp3038 from nginx to ats-tls - T231433
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:34 vgutierrez: switching from nginx to ats-tls on cp4024 - T231433
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013, es1014 T227536 (duration: 01m 00s)
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change - lag will be generated on s6 labs', diff saved to https://phabricator.wikimedia.org/P9274 and previous config saved to /var/cache/conftool/dbconfig/20191009-051911-marostegui.json
  • 05:11 marostegui: Restart gerrit as it is down
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P9273 and previous config saved to /var/cache/conftool/dbconfig/20191009-045941-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312', diff saved to https://phabricator.wikimedia.org/P9272 and previous config saved to /var/cache/conftool/dbconfig/20191009-044752-marostegui.json
  • 04:40 vgutierrez: switching cp5004 from nginx to ats-tls - T231433

2019-10-08

  • 23:28 mutante: phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
  • 23:05 ebernhardson@deploy1001: Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
  • 21:28 XenoRyet: updated payments-wiki from d2e2637275 to 8a65f57874
  • 21:09 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 20:38 mutante: labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
  • 20:24 mutante: labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
  • 19:51 marxarelli: 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
  • 19:43 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
  • 19:38 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
  • 19:29 shdubsh: adding swagger exporter to apt repo
  • 19:13 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
  • 18:54 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
  • 18:53 godog: codfw-prod: more weight to ms-be205[1-6] - T233638
  • 18:45 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
  • 17:32 marxarelli: cutting wmf/1.35.0-wmf.1
  • 16:17 cstone: civicrm revision changed from db7ef10bfa to 2ba100486e
  • 16:00 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:30 XioNoX: remove 2 more sessions to AS12871 on cr2-esams - T232617
  • 15:20 XioNoX: add BGP sessions to AS199524 on cr2-eqdfw
  • 15:18 XioNoX: add BGP sessions to AS2635 on cr2-eqiad
  • 15:13 XioNoX: renumber BGP session to AS4761 on cr1-eqsin
  • 13:53 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:51 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
  • 13:50 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
  • 13:49 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 marostegui@cumin2001: dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
  • 13:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
  • 13:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
  • 13:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
  • 12:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
  • 12:27 marostegui: Stop MySQL on es1012 for onsite maintenance
  • 12:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:10 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:58 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:57 jbond42: testing ipmi reset cookbook. using the current pass for both old and new so no reset actully occures
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:57 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:22 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:21 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 moritzm: draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
  • 10:16 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
  • 10:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:09 mobrovac@deploy1001: Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
  • 09:20 marostegui: Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
  • 09:09 moritzm: draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 mobrovac@deploy1001: Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
  • 08:38 mobrovac@deploy1001: Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
  • 08:33 elukey: roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
  • 08:10 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:10 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:09 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 07:51 moritzm: draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:49 akosiaris: update OTRS to 5.0.38
  • 07:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
  • 07:10 moritzm: draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
  • 06:48 marostegui: Stop MySQL on es1011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
  • 06:09 marostegui: Repool labsdb1011 after mysql upgrade
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:44 elukey: drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
  • 05:35 elukey: drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
  • 05:25 marostegui: Depool labsdb1011 for mysql upgrade
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
  • 05:10 marostegui: Reload query killer on labsdb1011
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
  • 05:07 marostegui: Deploy schema change on db1097:3315 - T233625
  • 03:04 andrewbogott: restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 — experimental band-aid for T234876
  • 00:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)

2019-10-07

  • 23:52 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:26 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 00m 49s)
  • 23:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:21 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)
  • 22:29 chaomodus: restart nagios-nrpe-server on stat1007
  • 21:56 mutante: gerrit2001 - sudo rm /etc/apache2/sites-available/50-gerrit-slave-wikimedia-org.conf
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Run Labs config after CSP config so it can change it (duration: 00m 51s)
  • 21:20 godog: swift codfw-prod: add ms-be205[3456] - T233638
  • 20:56 XenoRyet: updated payments-wiki from b94da68f7e to d2e2637275
  • 20:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:33 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:29 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add the beta REL1_34 to ExtensionDistributor (duration: 00m 50s)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 Lucas_WMDE: Morning SWAT done
  • 19:09 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/Wikibase: SWAT: Revert "Format coordinates with limited precision" (T174504) (duration: 00m 57s)
  • 18:33 Lucas_WMDE: reopen Morning SWAT for another backport (sorry)
  • 18:26 Urbanecm: Morning SWAT done
  • 18:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: 011b6eb: 11033b7: Update VE core submodule to 2ffb699eb (TreeModifier fixes), T234489, T234742 + ve.ui.MWDefinedTransclusionContextItem: Fix handling of template names (T234817) (duration: 00m 53s)
  • 18:16 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/539978
  • 18:12 andrewbogott: apt dist-upgrade on all cloudvirts (for nova upgrades)
  • 18:12 godog: start swiftrepl eqiad -> codfw (no deletes)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f434ae3: Enable NewUserMessage on sq.wikipedia and sq.wikiquote (T234499) (duration: 00m 52s)
  • 18:07 jgleeson: Updating civicrm from c12f7bb51f to db7ef10bfa
  • 17:46 ottomata: stat1007 is unresponsive, can't login via mgmt either. powercycling.
  • 17:29 XioNoX: add BGP route damping on IX sessions - eqiad - T222424
  • 17:27 XioNoX: add BGP route damping on IX sessions - esams - T222424
  • 17:22 XioNoX: add BGP route damping on IX sessions - eqsin - T222424
  • 15:34 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae (duration: 06m 28s)
  • 15:30 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:27 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae
  • 15:27 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop writing wmgVisualEditorEnableNewMobileContext (duration: 00m 51s)
  • 15:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgVisualEditorEnableNewMobileContext (duration: 00m 52s)
  • 14:25 arturo: upgrading openstack in CloudVPS. Some IRC bots and related stuff may be unavailable (T212302)
  • 14:17 marostegui: Deploy schema change on db1139:3316 - T233135 T234066
  • 13:27 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata to write both for item term store (T225055) (duration: 00m 54s)
  • 13:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2 (duration: 06m 38s)
  • 13:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9248 and previous config saved to /var/cache/conftool/dbconfig/20191007-131720-marostegui.json
  • 13:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging (duration: 07m 01s)
  • 13:13 elukey: upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941
  • 13:09 mobrovac@deploy1001: Started deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging
  • 13:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: (no justification provided) (duration: 00m 29s)
  • 13:04 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: (no justification provided)
  • 13:04 mobrovac@deploy1001: deploy aborted: Minor tweaks to VE logging (duration: 01m 07s)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9247 and previous config saved to /var/cache/conftool/dbconfig/20191007-130317-marostegui.json
  • 13:03 mobrovac@deploy1001: Started deploy [restbase/deploy@fe39197]: Minor tweaks to VE logging
  • 12:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restrouter
  • 12:54 elukey: upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941
  • 11:44 Lucas_WMDE: EU SWAT done
  • 11:44 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of main page hack for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:42 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:41 Amir1: another hack bites the dust
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/GrowthExperiments/: SWAT: Homepage: Don't use flexbox for vertical layouts in mobile start module (T234380) (duration: 00m 53s)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on nlwiki (T234685) (duration: 00m 52s)
  • 11:16 arturo: added bdsync 0.11.1-1~wmf1 to buster-wikimedia (T234683)
  • 10:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5 (duration: 04m 17s)
  • 10:55 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5
  • 10:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4 (duration: 04m 27s)
  • 10:50 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4
  • 10:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3 (duration: 03m 53s)
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:31 _joe_: uploading confd 0.16.0 to stretch
  • 10:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2 (duration: 01m 56s)
  • 10:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2
  • 10:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)
  • 10:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772
  • 09:55 marostegui: Deploy schema change on db2129 (s6 codfw master), this will generate lag on s6 codfw - T233135 T234066
  • 08:34 hashar: gerrit: force reindexing all changes ( gerrit index start changes --force )
  • 07:09 marostegui: Remove grants for dbproxy1006 on m1 databases - T231280
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9246 and previous config saved to /var/cache/conftool/dbconfig/20191007-065645-marostegui.json
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1011 T227138 (duration: 01m 10s)
  • 06:08 elukey: upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:25 marostegui: Deploy schema change on db2124 T233135 T234066
  • 05:10 marostegui: The above was for db2095:3316 T234704
  • 05:08 marostegui: Stop replication on db2076 to modify triggers on db2096:3316 T234704
  • 05:02 marostegui: Fix replication on labsdb1011:s8
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9245 and previous config saved to /var/cache/conftool/dbconfig/20191007-045411-marostegui.json

2019-10-06

  • 20:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Racconish /home/urbanecm/T234741 (T234741)
  • 19:15 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy1018, dbproxy1019
  • 06:47 elukey: delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam

2019-10-05

  • 06:48 elukey: force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory

2019-10-04

  • 22:06 mutante: ms-be1020 - power cycle via mgmt - host down
  • 20:43 krinkle@deploy1001: Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)
  • 20:41 mutante: deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)
  • 20:32 mutante: gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)
  • 19:27 mutante: wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts
  • 14:03 marostegui: Deploy schema change on db2117 T233135 T234066
  • 13:50 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:36 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:28 marostegui: Deploy schema change on db2097:3316 T233135 T234066
  • 12:23 elukey: cleaned up old files and apt-cache from an-coord1001
  • 08:41 marostegui: Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066
  • 08:32 _joe_: reuploading the old confd package to