You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech
Jump to navigation Jump to search
imported>Stashbot
(dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99))
imported>Stashbot
(ejegg: changed donations queue consumer and thank you mailer to use 3 minute cycles)
Line 1: Line 1:
 +
== 2019-12-04 ==
 +
* 00:49 ejegg: changed donations queue consumer and thank you mailer to use 3 minute cycles
 +
* 00:41 twentyafterfour: switching phabricator to read-only mode
 +
* 00:40 reedy@deploy1001: Synchronized php-1.35.0-wmf.8/skins/Vector/includes/templates/SearchComponent.mustache: {{Gerrit|I9776a3c355081dc5fec7753edf256f55dfe6045b}} (duration: 01m 01s)
 +
 
== 2019-12-03 ==
 
== 2019-12-03 ==
 +
* 23:47 volans: re-enabled meta-monitoring crontabs on wikitech-static after cleanup, reboot and fix wikitech-static's import errors
 +
* 22:59 volans: apt-get dist-upgrade and reboot of wikitech-static host
 +
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove  settings for closed wikis [[phab:T231178|T231178]] (duration: 01m 01s)
 +
* 22:34 volans: disabled temporarily icinga meta-monitoring (disk full on the wikitech-static host)
 +
* 22:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable the Wikisource extension on frwikisource [[phab:T239731|T239731]] (duration: 01m 00s)
 +
* 22:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Read wmgDoNotRedirectOnSearchMatch to decide to enable auto-redirect search result change [[phab:T235263|T235263]] (duration: 01m 00s)
 +
* 22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgDoNotRedirectOnSearchMatch, default off, on for Test Commons [[phab:T235263|T235263]] (duration: 01m 01s)
 +
* 22:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgXmlDumpSchemaVersion to 0.1.0 everywhere [[phab:T238921|T238921]] [[phab:T174031|T174031]] (duration: 01m 03s)
 +
* 21:40 eileen: civicrm revision changed from {{Gerrit|26b788378e}} to {{Gerrit|0f51030071}}, config revision is {{Gerrit|17b6730a72}} - includes 3 possible performance improvements - logging reduction, cache a query result & cache file existence
 +
* 21:38 volker-e@deploy1001: Finished deploy [design/style-guide@02a92f7]: Deploy design/style-guide:  (duration: 00m 07s)
 +
* 21:38 volker-e@deploy1001: Started deploy [design/style-guide@02a92f7]: Deploy design/style-guide:
 +
* 21:09 sbassett: Deployed security patch for [[phab:T238768|T238768]] to wmf.8
 +
* 21:03 sbassett: Deployed security patch for [[phab:T238768|T238768]] to wmf.5
 +
* 20:43 mutante: mw2259 - did not come back from reboot after reimage, also mgmt not reachable ([[phab:T239054|T239054]])
 +
* 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.wmnet
 +
* 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet
 +
* 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
 +
* 20:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns[12]002.wikimedia.org
 +
* 20:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts (duration: 05m 46s)
 +
* 19:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts
 +
* 19:53 ejegg: shifted 20 more sec / cycle from donations QC to thank you mailer
 +
* 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:28 bblack@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 19:22 bblack@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 19:16 Urbanecm: Morning SWAT done
 +
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5c83491}}: Create translation namespace on nap.wikisource ([[phab:T239547|T239547]]) (duration: 01m 03s)
 +
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|45edf5a}}: Add partial blocks for scowiki ([[phab:T239493|T239493]]) (duration: 01m 00s)
 +
* 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
 +
* 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
 +
* 19:08 bblack: reimagine dns1002 + dns2002 - [[phab:T239667|T239667]]
 +
* 19:07 thcipriani@deploy1001: Synchronized scap/plugins: [[gerrit:526509{{!}}scap: prep and clean git ops for /srv/patches]] [[phab:T222240|T222240]] (no-op sync) (duration: 01m 01s)
 +
* 17:52 ejegg: disabled PayPal orphan rectifier debug logging
 +
* 17:48 ejegg: adjusted timing of thank you mailer and donations QC to give 5 more sec / cycle to TY mails
 +
* 17:43 ejegg: updated fundraising CiviCRM from {{Gerrit|4f3341455f}} to {{Gerrit|26b788378e}}
 +
* 17:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 17:19 bblack@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 17:18 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 17:14 bblack@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 17:13 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings (duration: 05m 49s)
 +
* 17:07 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings
 +
* 16:52 bblack: reimaging dns3002 + dns5002
 +
* 16:30 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Remove slow result randomization from the suggestions query (duration: 01m 03s)
 +
* 16:02 ejegg: reduced donations queue consumer 10 sec per cycle and increased TY mail sender 10 sec per cycle
 +
* 15:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
 +
* 15:44 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
 +
* 15:38 ejegg: updated fundraising CiviCRM from {{Gerrit|5cf2d2713f}} to {{Gerrit|4f3341455f}}
 +
* 15:34 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
 +
* 15:20 elukey: executing sudo cumin -b6 -s 20 -p 95 'A:mw-api-eqiad' 'restart-php7.2-fpm' on cumin1001
 +
* 14:52 godog: swift eqiad-prod: final weight to ms-be105[7-9] - [[phab:T237438|T237438]]
 +
* 14:24 ema: all cp-esams hosts switched to digicert-2019a certs [[phab:T238494|T238494]]
 +
* 14:19 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
 +
* 14:17 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
 +
* 14:13 ema: cp-esams: re-enable puppet, switch to digicert-2019a certs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ [[phab:T238494|T238494]]
 +
* 14:06 ema: repool cp3050 with digicert-2019a [[phab:T238494|T238494]]
 +
* 14:00 ema: cp3050: depool and switch to digicert-2019a [[phab:T238494|T238494]]
 +
* 13:56 ema: cp-esams: disable puppet in preparation of digicert-2019a cert switch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ [[phab:T238494|T238494]]
 +
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P9802 and previous config saved to /var/cache/conftool/dbconfig/20191203-133231-marostegui.json
 +
* 13:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - [[phab:T239643|T239643]] (duration: 10m 43s)
 +
* 13:11 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - [[phab:T239643|T239643]]
 +
* 12:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - [[phab:T239643|T239643]] (duration: 14m 41s)
 +
* 12:28 mobrovac@deploy1001: Started deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - [[phab:T239643|T239643]]
 +
* 12:23 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2 (duration: 11m 17s)
 +
* 12:12 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2
 +
* 12:12 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - [[phab:T229015|T229015]] [[phab:T239643|T239643]] (duration: 13m 29s)
 +
* 12:09 Amir1: EU SWAT is done
 +
* 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:554164{{!}}Set read new for term store for items for client wikis up to Q1000 (T225057)]] (duration: 01m 00s)
 +
* 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - [[phab:T229015|T229015]] [[phab:T239643|T239643]]
 +
* 11:58 mobrovac@deploy1001: deploy aborted: Mirror html2html traffic to Parsoid/PHP - [[phab:T229015|T229015]] [[phab:T239643|T239643]] (duration: 00m 00s)
 +
* 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - [[phab:T229015|T229015]] [[phab:T239643|T239643]]
 +
* 11:36 hashar: Updated operations-puppet-tests-stretch-docker to fix pip cache directory
 +
* 11:31 godog: refresh kibana fields for logstash-*
 +
* 11:00 hashar: Updated operations-puppet-tests-stretch-docker CI job to use tox 3.10.0 and support various python 3 versions
 +
* 10:37 ema: pool cp1083 with ATS backend [[phab:T227432|T227432]]
 +
* 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 10:01 ema: depool cp1083 and reimage as text_ats [[phab:T227432|T227432]]
 +
* 09:22 effie: Roll restart php-fpm mw[1240-1258,1261-1275,1319-1333].eqiad.wmnet
 +
* 09:05 godog: downtime new logstash hosts in codfw/eqiad until thurs
 +
* 09:02 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
 +
* 09:02 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
 +
* 09:00 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
 +
* 08:48 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
 +
* 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
 +
* 08:45 effie: Restart php-fpm on mw[1330-1333].eqiad.wmnet
 +
* 08:45 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
 +
* 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
 +
* 08:35 ema: cp3050: set cache.max_open_read_retries=-1 and proxy.config.http.cache.max_open_write_retries=1 (default values) [[phab:T238494|T238494]]
 +
* 08:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1062 from config [[phab:T239188|T239188]] (duration: 01m 02s)
 +
* 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1062 from config [[phab:T239188|T239188]] (duration: 01m 08s)
 +
* 08:20 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 08:19 akosiaris: apply calico rules for eventgate-logging-external. [[phab:T236386|T236386]]
 +
* 08:18 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 08:14 volker-e@deploy1001: Finished deploy [design/style-guide@7978f0d]: Deploy design/style-guide:  (duration: 00m 06s)
 +
* 08:14 volker-e@deploy1001: Started deploy [design/style-guide@7978f0d]: Deploy design/style-guide:
 +
* 07:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 06:29 marostegui: Deploy schema change on db1112 with replication (this will generate lag on s3 on labs)
 +
* 06:19 volker-e@deploy1001: Finished deploy [design/style-guide@8e08740]: Deploy design/style-guide:  (duration: 00m 08s)
 +
* 06:19 volker-e@deploy1001: Started deploy [design/style-guide@8e08740]: Deploy design/style-guide:
 +
* 06:07 marostegui: Stop MySQL on db1062 for decommissioning [[phab:T239188|T239188]]
 +
* 06:00 marostegui: Remove db2065 from tendril and zarcillo [[phab:T239046|T239046]]
 +
* 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
 +
* 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
 +
* 05:50 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=thread [[phab:T238494|T238494]]
 +
* 05:47 marostegui: Remove ar_comment triggers from s3 db1124:3313 - [[phab:T234704|T234704]]
 +
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P9798 and previous config saved to /var/cache/conftool/dbconfig/20191203-054528-marostegui.json
 +
* 04:19 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/EntryPoint.php: disable IE6 safety checks for [[phab:T239666|T239666]] (duration: 01m 00s)
 +
* 04:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.8/includes/Rest/EntryPoint.php: disable IE6 safety checks for [[phab:T239666|T239666]] (duration: 01m 01s)
 +
* 03:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses ([[phab:T239659|T239659]]) (duration: 05m 51s)
 +
* 03:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses ([[phab:T239659|T239659]])
 +
* 02:54 mutante: mw1269 restarted nginx, php
 +
* 02:48 mutante: mw1320, mw1321 restarted php-fpm
 +
* 02:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T78711|T78711]] Display 'twice a month' or 'once a month' on cached reports (duration: 01m 19s)
 +
* 02:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting testwiki => true for wmgUseCentralAuth, already implied by default (duration: 01m 24s)
 +
* 02:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T237698|T237698]] Stop setting wmgUseDPL, unread (duration: 01m 11s)
 +
* 02:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T237698|T237698]] Read wmgUseDynamicPageList not wmgUseDPL (duration: 01m 22s)
 +
* 02:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T237698|T237698]] Set wmgUseDynamicPageList, less cryptic form of wmgUseDPL (duration: 01m 16s)
 +
* 02:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgTorLoadNodes, not read for a while (duration: 01m 14s)
 +
* 02:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGEHelpPanelSearchEnabled, no longer used (duration: 01m 08s)
 +
* 02:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T239091|T239091]] Enable Translate extension on sewikimedia, second try (duration: 01m 24s)
 +
* 01:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: [[phab:T239209|T239209]] Sanitize HTML on paste (duration: 01m 33s)
 +
* 01:55 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/VisualEditor/: [[phab:T239209|T239209]] Sanitize HTML on paste (duration: 01m 24s)
 
* 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
 
* 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
 
* 01:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
 
* 01:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

Revision as of 00:49, 4 December 2019

2019-12-04

  • 00:49 ejegg: changed donations queue consumer and thank you mailer to use 3 minute cycles
  • 00:41 twentyafterfour: switching phabricator to read-only mode
  • 00:40 reedy@deploy1001: Synchronized php-1.35.0-wmf.8/skins/Vector/includes/templates/SearchComponent.mustache: I9776a3 (duration: 01m 01s)

2019-12-03

  • 23:47 volans: re-enabled meta-monitoring crontabs on wikitech-static after cleanup, reboot and fix wikitech-static's import errors
  • 22:59 volans: apt-get dist-upgrade and reboot of wikitech-static host
  • 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove settings for closed wikis T231178 (duration: 01m 01s)
  • 22:34 volans: disabled temporarily icinga meta-monitoring (disk full on the wikitech-static host)
  • 22:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable the Wikisource extension on frwikisource T239731 (duration: 01m 00s)
  • 22:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Read wmgDoNotRedirectOnSearchMatch to decide to enable auto-redirect search result change T235263 (duration: 01m 00s)
  • 22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgDoNotRedirectOnSearchMatch, default off, on for Test Commons T235263 (duration: 01m 01s)
  • 22:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgXmlDumpSchemaVersion to 0.1.0 everywhere T238921 T174031 (duration: 01m 03s)
  • 21:40 eileen: civicrm revision changed from 26b788378e to 0f51030071, config revision is 17b6730a72 - includes 3 possible performance improvements - logging reduction, cache a query result & cache file existence
  • 21:38 volker-e@deploy1001: Finished deploy [design/style-guide@02a92f7]: Deploy design/style-guide: (duration: 00m 07s)
  • 21:38 volker-e@deploy1001: Started deploy [design/style-guide@02a92f7]: Deploy design/style-guide:
  • 21:09 sbassett: Deployed security patch for T238768 to wmf.8
  • 21:03 sbassett: Deployed security patch for T238768 to wmf.5
  • 20:43 mutante: mw2259 - did not come back from reboot after reimage, also mgmt not reachable (T239054)
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.wmnet
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
  • 20:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns[12]002.wikimedia.org
  • 20:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts (duration: 05m 46s)
  • 19:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@c21a1ca]: Bump preq version for better logging around MW API timeouts
  • 19:53 ejegg: shifted 20 more sec / cycle from donations QC to thank you mailer
  • 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:28 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:22 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:16 Urbanecm: Morning SWAT done
  • 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5c83491: Create translation namespace on nap.wikisource (T239547) (duration: 01m 03s)
  • 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 45edf5a: Add partial blocks for scowiki (T239493) (duration: 01m 00s)
  • 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
  • 19:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 19:08 bblack: reimagine dns1002 + dns2002 - T239667
  • 19:07 thcipriani@deploy1001: Synchronized scap/plugins: scap: prep and clean git ops for /srv/patches T222240 (no-op sync) (duration: 01m 01s)
  • 17:52 ejegg: disabled PayPal orphan rectifier debug logging
  • 17:48 ejegg: adjusted timing of thank you mailer and donations QC to give 5 more sec / cycle to TY mails
  • 17:43 ejegg: updated fundraising CiviCRM from 4f3341455f to 26b788378e
  • 17:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:19 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:18 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:13 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings (duration: 05m 49s)
  • 17:07 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@498c3d1]: repair bulk daemon swift listings
  • 16:52 bblack: reimaging dns3002 + dns5002
  • 16:30 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Remove slow result randomization from the suggestions query (duration: 01m 03s)
  • 16:02 ejegg: reduced donations queue consumer 10 sec per cycle and increased TY mail sender 10 sec per cycle
  • 15:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:44 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:38 ejegg: updated fundraising CiviCRM from 5cf2d2713f to 4f3341455f
  • 15:34 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 15:20 elukey: executing sudo cumin -b6 -s 20 -p 95 'A:mw-api-eqiad' 'restart-php7.2-fpm' on cumin1001
  • 14:52 godog: swift eqiad-prod: final weight to ms-be105[7-9] - T237438
  • 14:24 ema: all cp-esams hosts switched to digicert-2019a certs T238494
  • 14:19 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 14:17 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 14:13 ema: cp-esams: re-enable puppet, switch to digicert-2019a certs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ T238494
  • 14:06 ema: repool cp3050 with digicert-2019a T238494
  • 14:00 ema: cp3050: depool and switch to digicert-2019a T238494
  • 13:56 ema: cp-esams: disable puppet in preparation of digicert-2019a cert switch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554291/ T238494
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P9802 and previous config saved to /var/cache/conftool/dbconfig/20191203-133231-marostegui.json
  • 13:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - T239643 (duration: 10m 43s)
  • 13:11 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Revert mirroring html2html traffic to PHP - T239643
  • 12:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - T239643 (duration: 14m 41s)
  • 12:28 mobrovac@deploy1001: Started deploy [restbase/deploy@41bb230]: Log all html2html errors coming from Parsoid/PHP - T239643
  • 12:23 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2 (duration: 11m 17s)
  • 12:12 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP, take #2
  • 12:12 mobrovac@deploy1001: Finished deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643 (duration: 13m 29s)
  • 12:09 Amir1: EU SWAT is done
  • 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set read new for term store for items for client wikis up to Q1000 (T225057) (duration: 01m 00s)
  • 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643
  • 11:58 mobrovac@deploy1001: deploy aborted: Mirror html2html traffic to Parsoid/PHP - T229015 T239643 (duration: 00m 00s)
  • 11:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b346ebf]: Mirror html2html traffic to Parsoid/PHP - T229015 T239643
  • 11:36 hashar: Updated operations-puppet-tests-stretch-docker to fix pip cache directory
  • 11:31 godog: refresh kibana fields for logstash-*
  • 11:00 hashar: Updated operations-puppet-tests-stretch-docker CI job to use tox 3.10.0 and support various python 3 versions
  • 10:37 ema: pool cp1083 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 ema: depool cp1083 and reimage as text_ats T227432
  • 09:22 effie: Roll restart php-fpm mw[1240-1258,1261-1275,1319-1333].eqiad.wmnet
  • 09:05 godog: downtime new logstash hosts in codfw/eqiad until thurs
  • 09:02 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:02 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:00 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:48 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:45 effie: Restart php-fpm on mw[1330-1333].eqiad.wmnet
  • 08:45 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 08:45 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:35 ema: cp3050: set cache.max_open_read_retries=-1 and proxy.config.http.cache.max_open_write_retries=1 (default values) T238494
  • 08:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1062 from config T239188 (duration: 01m 02s)
  • 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1062 from config T239188 (duration: 01m 08s)
  • 08:20 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:19 akosiaris: apply calico rules for eventgate-logging-external. T236386
  • 08:18 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 08:14 volker-e@deploy1001: Finished deploy [design/style-guide@7978f0d]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:14 volker-e@deploy1001: Started deploy [design/style-guide@7978f0d]: Deploy design/style-guide:
  • 07:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 06:29 marostegui: Deploy schema change on db1112 with replication (this will generate lag on s3 on labs)
  • 06:19 volker-e@deploy1001: Finished deploy [design/style-guide@8e08740]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:19 volker-e@deploy1001: Started deploy [design/style-guide@8e08740]: Deploy design/style-guide:
  • 06:07 marostegui: Stop MySQL on db1062 for decommissioning T239188
  • 06:00 marostegui: Remove db2065 from tendril and zarcillo T239046
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:50 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=thread T238494
  • 05:47 marostegui: Remove ar_comment triggers from s3 db1124:3313 - T234704
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P9798 and previous config saved to /var/cache/conftool/dbconfig/20191203-054528-marostegui.json
  • 04:19 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/EntryPoint.php: disable IE6 safety checks for T239666 (duration: 01m 00s)
  • 04:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.8/includes/Rest/EntryPoint.php: disable IE6 safety checks for T239666 (duration: 01m 01s)
  • 03:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses (T239659) (duration: 05m 51s)
  • 03:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d00c6ad]: Fix: Apply language headers to zhwiki mobile-html responses (T239659)
  • 02:54 mutante: mw1269 restarted nginx, php
  • 02:48 mutante: mw1320, mw1321 restarted php-fpm
  • 02:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Display 'twice a month' or 'once a month' on cached reports (duration: 01m 19s)
  • 02:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting testwiki => true for wmgUseCentralAuth, already implied by default (duration: 01m 24s)
  • 02:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237698 Stop setting wmgUseDPL, unread (duration: 01m 11s)
  • 02:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T237698 Read wmgUseDynamicPageList not wmgUseDPL (duration: 01m 22s)
  • 02:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237698 Set wmgUseDynamicPageList, less cryptic form of wmgUseDPL (duration: 01m 16s)
  • 02:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgTorLoadNodes, not read for a while (duration: 01m 14s)
  • 02:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGEHelpPanelSearchEnabled, no longer used (duration: 01m 08s)
  • 02:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Enable Translate extension on sewikimedia, second try (duration: 01m 24s)
  • 01:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: T239209 Sanitize HTML on paste (duration: 01m 33s)
  • 01:55 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/VisualEditor/: T239209 Sanitize HTML on paste (duration: 01m 24s)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 01:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
  • 01:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 01:33 mutante: mw2250 - E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
  • 01:33 mutante: mw2252 rebooting
  • 01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2254.codfw.wmnet
  • 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
  • 01:22 mutante: mw2254 - rebooting (reimage script exited with segfault after reimage was done)
  • 01:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.5/includes/diff/DifferenceEngine.php: T236320 Don't calculate amount of inbetween revisions for MCR undo (duration: 00m 59s)
  • 01:15 jforrester@deploy1001: Synchronized dblists/wikidataclient.dblist: T239318 Add sewikimedia to wikidataclient (duration: 01m 03s)
  • 01:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Revert 'Enable Translate extension on sewikimedia' (duration: 01m 01s)
  • 01:00 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.8/extensions/Translate/sql/translate_{…}.sql T239091
  • 00:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239091 Enable Translate extension on sewikimedia (duration: 00m 57s)
  • 00:54 James_F: mwscript sql.php --wiki=sewikimedia php-1.35.0-wmf.5/extensions/Wikibase/client/sql/entity_usage.sql
  • 00:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/Echo/includes/DiscussionParser.php: T239275 Fix type hint fatal from getUserLinks() (duration: 01m 16s)
  • 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-12-02

  • 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
  • 23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
  • 23:05 mutante: mw2248 - restart nginx (for some reason unit was running but not listening on 443 after reimage..now it does)
  • 23:05 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:02 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:46 ejegg: updated payments-wiki from 06a8c3cdff to f61c9f0692
  • 22:44 bblack: reimaging dns4002 to buster - T239667
  • 22:07 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/MachineVision: Update text for no personal uploads message (T238873) (duration: 01m 03s)
  • 22:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
  • 21:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
  • 21:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
  • 21:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:22 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P9796 and previous config saved to /var/cache/conftool/dbconfig/20191202-205904-marostegui.json
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=nginx,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2232.codfw.wmnet,service=apache2,dc=codfw
  • 20:47 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=nginx,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet,dc=codfw,service=apache2,cluster=appserver
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=nginx,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet,service=apache2,cluster=appserver,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=nginx,dc=codfw
  • 20:46 ariel@cumin1001: conftool action : set/pooled=yes; selector: cluster=appserver,name=mw2231.codfw.wmnet,service=apache2,dc=codfw
  • 20:36 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch Flow on all wikis to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 20:35 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 20:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015 (duration: 14m 59s)
  • 20:12 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2) (duration: 00m 05s)
  • 20:12 joal@deploy1001: Started deploy [analytics/refinery@9cd234a] (thin): Analytics deploy - Fixes for today deploy (2)
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2) (duration: 08m 08s)
  • 20:06 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e]: Switch everything to Parsoid/PHP - T229015
  • 20:05 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labslabslabs (duration: 01m 08s)
  • 20:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP (duration: 02m 48s)
  • 20:02 mobrovac@deploy1001: Started deploy [restbase/deploy@92acf1e] (dev-cluster): Switch everything to Parsoid/PHP
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:59 joal@deploy1001: Started deploy [analytics/refinery@9cd234a]: Analytics deploy - Fixes for today deploy (2)
  • 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:56 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:55 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:51 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:50 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:50 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:50 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015 (duration: 13m 48s)
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:23 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5]: Switch everything but enwiki to Parsoid/PHP - T229015
  • 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mobrovac@deploy1001: Finished deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP (duration: 06m 38s)
  • 19:16 mobrovac@deploy1001: Started deploy [restbase/deploy@e69e2e5] (dev-cluster): Switch everything but enwiki to Parsoid/PHP
  • 19:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015 (duration: 14m 11s)
  • 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 mobrovac@deploy1001: Started deploy [restbase/deploy@6a24685]: Parsoid Proxy: Direct html2html traffic to JS; Stop honouring the variant header; Switch sr and zh wikis to PHP - T229015
  • 18:39 joal@deploy1001: Finished deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy (duration: 00m 06s)
  • 18:39 joal@deploy1001: Started deploy [analytics/refinery@980298b] (thin): Analytics deploy - Fixes for today deploy
  • 18:38 joal@deploy1001: Finished deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy (duration: 08m 21s)
  • 18:32 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:30 joal@deploy1001: Started deploy [analytics/refinery@980298b]: Analytics deploy - Fixes for today deploy
  • 18:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes (duration: 15m 42s)
  • 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:00 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@97d17f6]: New blazegraph and WDQS build plus GUI changes
  • 17:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015 (duration: 14m 06s)
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:42 mobrovac@deploy1001: Started deploy [restbase/deploy@ff7862f]: Switch sr and zh wikipediae back to Parsoid/JS - T229015
  • 17:29 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495 (duration: 01m 14s)
  • 17:28 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@deafe56]: Followup on cirrusSearchElasticWrite partitioning T230495
  • 17:21 ssastry@deploy1001: Finished deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy (duration: 07m 48s)
  • 17:14 ssastry@deploy1001: Started deploy [parsoid/deploy@743efb0]: Updating Parsoid to ca588b25 + fix broken langconv library / deploy
  • 17:09 ejegg: disabled fundraising job omnimail_groupmember_load
  • 16:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:43 ejegg: updated fundraising internal dashboard from 8fc2726736 to 3a93d2aba4
  • 16:43 effie: restart all API cluster in eqiad
  • 16:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 hashar: Restarted CI Jenkins
  • 16:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015 (duration: 13m 53s)
  • 16:41 ema: cp3050: ats-be restart with proxy.config.http.server_session_sharing.pool=global T238494
  • 16:32 ema: cp3053: repooling after firmware update T239041
  • 16:27 mobrovac@deploy1001: Started deploy [restbase/deploy@3516382]: Switch ru, sr and zh wikipediae to Parsoid/PHP - T229015
  • 16:19 effie: reimage mw1295.eqiad.wmnet mw1294.eqiad.wmnet mw1293.eqiad.wmnet
  • 16:11 robh: cp3053 depooling and rebooting for firmware update T239041
  • 16:10 robh: cp3035 depooling and rebooting for firmware update T239041
  • 15:38 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:38 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid VRS: Switch groups 0 and 1 to Parsoid/PHP - T229015 (duration: 00m 59s)
  • 15:35 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607 (duration: 14m 51s)
  • 15:26 effie: Rolling restart mw1345-1348
  • 15:15 mobrovac@deploy1001: Started deploy [restbase/deploy@d6d5a6e]: Parsoid Proxy: Do not use the fall-back for linting transforms - T239607
  • 14:46 ema: cp-ats: set server_session_sharing.match=2 everywhere (puppet re-enable and run) T238494
  • 14:31 ema: cp-ats: merge server_session_sharing.match=2 (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553490/) with puppet disabled, test on cp3050 T238494
  • 14:18 godog: set grafana theme back to light, was dark for some reason
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P9794 and previous config saved to /var/cache/conftool/dbconfig/20191202-135643-marostegui.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P9793 and previous config saved to /var/cache/conftool/dbconfig/20191202-135543-marostegui.json
  • 13:47 ema: power-cycle cp3053 T239041
  • 13:44 hashar: Restarted CI Jenkins
  • 13:30 hashar: Restarted CI Jenkins
  • 13:14 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015 (duration: 14m 49s)
  • 13:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38]: Parsoid Proxy: Fixes - T229015
  • 12:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes (duration: 02m 54s)
  • 12:54 mobrovac@deploy1001: Started deploy [restbase/deploy@eedba38] (dev-cluster): Parsoid Proxy: Fixes
  • 12:54 Urbanecm: EU SWAT done
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d27fe78: Enable partial blocks on eswiki (T239370) (duration: 01m 00s)
  • 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 445bdc3: Remove `move-rootuserpages` from user on svwiki (T238842) (duration: 01m 04s)
  • 12:43 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki*.png
  • 12:39 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 61a9563: Revert "Change bawiki logo to an anniversary one" (T237070) (duration: 01m 06s)
  • 12:37 effie: reimage mw1296.eqiad.wmnet
  • 12:37 effie: reimage mw1298.eqiad.wmnet
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set read new for term store for items of wikidata up to Q1000 (T225057) (duration: 01m 00s)
  • 12:19 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.8/extensions/GrowthExperiments/: SWAT: Suggested edits: do not treat AQS lookup failure as error (T238178) (duration: 01m 02s)
  • 11:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:50 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
  • 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 moritzm: installing ruby2.1 security updates
  • 10:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 moritzm: installing python-psutil security updates
  • 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:42 effie: reimage mw1299.eqiad.wmnet
  • 10:18 effie: reimage mw1290.eqiad.wmnet
  • 10:18 effie: reimage mw1275.eqiad.wmnet
  • 10:15 moritzm: installing file/libmagic regresssion update for jessie
  • 10:08 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
  • 09:52 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:41 joal@deploy1001: Finished deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin) (duration: 00m 08s)
  • 09:41 joal@deploy1001: Started deploy [analytics/refinery@8991301] (thin): Regular analytics deploy - late from last week (thin)
  • 09:40 joal@deploy1001: Finished deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week (duration: 18m 22s)
  • 09:23 effie: reimage mw1300.eqiad.wmnet
  • 09:23 effie: reimage mw1300.eqiad.wmne
  • 09:22 joal@deploy1001: Started deploy [analytics/refinery@8991301]: Regular analytics deploy - late from last week
  • 09:16 moritzm: installing libvpx security updates
  • 09:14 godog: extend graphite LVs on graphite1004 / graphite2003 by 200G
  • 08:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 effie: reimage mw1287.eqiad.wmnet mw1288.eqiad.wmnet mw1289.eqiad.wmnet
  • 08:08 effie: reimage mw1301.eqiad.wmnet
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 andrewbogott: forcing a reboot of cloudstore1008 via mgmt console — it seems to have locked up
  • 06:43 Urbanecm: Clear account creation throttle for several IPs (T239465)
  • 06:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for cawiki workshop (T239465) (duration: 01m 03s)
  • 06:00 marostegui: Compress s8 codfw master (lag might appear on codfw s8)
  • 06:00 marostegui: Compress s4 codfw master (lag might appear on codfw s4)
  • 05:56 marostegui: Deploy schema change on db1075
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P9791 and previous config saved to /var/cache/conftool/dbconfig/20191202-055546-marostegui.json
  • 05:53 marostegui: Compress db1099:3318 T235599
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for compression', diff saved to https://phabricator.wikimedia.org/P9790 and previous config saved to /var/cache/conftool/dbconfig/20191202-055245-marostegui.json

2019-12-01

  • 23:27 ladsgroup@deploy1001: Started restart [mobileapps/deploy@70154b4]: Rolling restart of mobileapps
  • 23:20 bblack: restarting AQS services in eqiad
  • 23:15 eileen: process-control config revision is 9750c318a0 - jobs disabled
  • 21:39 andrewbogott: restarted nova conductor and api on cloudcontrol1003 and 1004 to free up db connections (T239168)

2019-11-30

  • 15:47 Urbanecm: Reset email of SUL user Hayk.arabaget (T239462)
  • 07:40 vgutierrez: repooling cp3057 - T239502
  • 07:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 07:30 vgutierrez: depool and powercycle cp3057 - T239502

2019-11-29

  • 22:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:12 effie: reimage mw1302.eqiad.wmnet
  • 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:19 effie: reimage mw1284.eqiad.wmnet
  • 19:19 effie: reimage mw1303.eqiad.wmnet mw1283.eqiad.wmnet
  • 17:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
  • 16:17 effie: reimage mw1274.eqiad.wmnet
  • 16:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 effie: reimage mw1282.eqiad.wmnet
  • 14:45 effie: reimage mw1282.eqiad.wmne
  • 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 effie: reimage mw1323.eqiad.wmnet mw1297.eqiad.wmnet mw1273.eqiad.wmnet
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 filippo@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
  • 14:13 godog: reimage mw2228 for partman tests
  • 14:02 effie: reimage mw1271.eqiad.wmnet mw1272.eqiad.wmnet mw1304.eqiad.wmnet
  • 13:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 jynus: reenable puppet on dbprov2001, backup1001
  • 13:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 jynus: disabling puppet also on on backup1001 to test recoveries
  • 12:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 effie: reimage mw1305.eqiad.wmnet mw1265.eqiad.wmnet mw1270.eqiad.wmnet
  • 11:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:39 jynus: disabling puppet on dbprov2001 to test recoveries
  • 11:34 effie: reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmnet
  • 11:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 Lucas_WMDE: <effie> 10:58:17 log reimage mw1268.eqiad.wmnet mw1280.eqiad.wmnet mw1281.eqiad.wmne
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 elukey@deploy1001: Finished deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts (duration: 00m 08s)
  • 10:47 elukey@deploy1001: Started deploy [analytics/refinery@97015e4] (thin): Deploy thin Analytics Refinery (no jars/git-fat-obj) to notebook and labstore hosts
  • 10:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 effie: reimage mw1306.eqiad.wmnet mw1264.eqiad.wmnet mw1279.eqiad.wmnet
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui: Remove triggers from db2094:3313 - T234704
  • 09:33 marostegui: Stop replication on db2105 (s3 codfw) for schema change
  • 09:23 effie: reimage mw1263.eqiad.wmnet mw1307.eqiad.wmnet
  • 09:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 volans: temporary disabling puppet on 'R:keyholder::agent' to merge gerrit:operations/puppet/+/553460 - T239386
  • 09:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 effie: reimage mw2223.codfw.wmnet mw2222.codfw.wmnet mw2221.codfw.wmnet mw2220.codfw.wmnet
  • 07:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:25 effie: reimage mw1312.eqiad.wmnet mw1308.eqiad.wmnet mw1261.eqiad.wmnet
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P9781 and previous config saved to /var/cache/conftool/dbconfig/20191129-055845-marostegui.json
  • 05:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.5/includes/exception/MWExceptionHandler.php: 532f4aba96d85 (duration: 01m 03s)

2019-11-28

  • 23:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:21 effie: reimage mw1329.eqiad.wmnet
  • 23:01 effie: restart cp1087
  • 22:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:19 effie: reimage mw1309.eqiad.wmnet
  • 21:19 effie: reimage mw1323.eqiad.wmnet
  • 21:11 effie: reimage mw1316.eqiad.wmnet mw1315.eqiad.wmnet
  • 20:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:03 effie: reimage mw1313.eqiad.wmnet
  • 20:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 effie: reimage mw1331.eqiad.wmnet mw1330.eqiad.wmnet mw1310.eqiad.wmnet
  • 18:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:41 marostegui: Deploy schema change on db1134
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P9780 and previous config saved to /var/cache/conftool/dbconfig/20191128-183918-marostegui.json
  • 18:29 effie: reimage w1319.eqiad.wmnet mw1318.eqiad.wmnet
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P9779 and previous config saved to /var/cache/conftool/dbconfig/20191128-180517-marostegui.json
  • 17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 effie: reimage mw1340.eqiad.wmnet mw1339.eqiad.wmnet
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:32 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:18 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:58 effie: reimage mw1311.eqiad.wmnet
  • 15:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 effie: reimage mw1333.eqiad.wmnet mw1332.eqiad.wmnet mw1331.eqiad.wmnet
  • 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 effie: reimage mw1343.eqiad.wmnet mw1342.eqiad.wmnet mw1341.eqiad.wmnet
  • 14:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 marostegui: Deploy schema change on s3 codfw on the master, lag will appear on s3 codfw (T234066)
  • 13:57 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 5 (T237984)
  • 13:57 marostegui: Deploy schema change on s4 codfw master with replication - T234066
  • 13:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:37 marostegui: Deploy schema change on db1106 with replication (lag will appear on s1 on labs) - T234066 T233135
  • 13:37 marostegui: Recreate views for enwiki_p.protected_titles for all labsdb hosts - T233135
  • 13:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 phamhi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:33 phamhi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:31 marostegui: Remove ar_comment triggers from db1124:3311 for enwiki.archive - T234704
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change, temporarily pool db1080 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9778 and previous config saved to /var/cache/conftool/dbconfig/20191128-133013-marostegui.json
  • 13:28 volans: cleanup root's crontab entries on netmon hosts from netbox/postres stuff - T238919
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P9777 and previous config saved to /var/cache/conftool/dbconfig/20191128-132647-marostegui.json
  • 13:21 volans: cumin 'netmon*' 'rm -v /var/spool/cron/crontabs/postgres' T238919
  • 13:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 effie: enable puppet on thumbor*
  • 13:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:51 effie: disable puppet on thumbor*
  • 12:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 effie: reimage mw1267.eqiad.wmnet mw1277.eqiad.wmnet
  • 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:36 effie: reimage mw1344.eqiad.wmnet mw1334.eqiad.wmnet mw1324.eqiad.wmnet
  • 11:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 effie: reimage mw2279 mw2278 mw2277 mw2276 mw2275
  • 10:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 marostegui: Compress labsdb1009
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 09:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:17 effie: reimage mw1266, mw1276
  • 09:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 marostegui: Compress labsdb1011
  • 08:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 marostegui: Remove m4 from tendril and zarcillo - T159170
  • 08:15 effie: reimage mw2280, mw2281, mw2282
  • 08:06 marostegui: Compress labsdb1012
  • 07:56 effie: reimage mw1345, mw1335, mw1325
  • 06:56 elukey: remove log files on an-tool1007 to free root partition space
  • 06:14 marostegui: Remove db1061 from tendril and zarcillo - T238624
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:02 marostegui: Remove db2067 from tendril and zarcillo T233185
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P9776 and previous config saved to /var/cache/conftool/dbconfig/20191128-055212-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P9775 and previous config saved to /var/cache/conftool/dbconfig/20191128-055025-marostegui.json
  • 03:03 vgutierrez: restarting keyholder on acmechief[12]001
  • 01:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:59 mutante: mw2244 restart php-fpm and apache which somehow are returning 5xx after reimage
  • 00:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2019-11-27

  • 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mutante: mw2215 scap pull
  • 21:30 mutante: mw2215 rebooting
  • 21:10 bblack: restarting acme-chief service on acmechief1001 (daemon appears to be stuck on a lock and nonfunctional for days...)
  • 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:14 cstone: payments-wiki revision changed from 2eb54fd6ef to 06a8c3cdff
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P9773 and previous config saved to /var/cache/conftool/dbconfig/20191127-193528-marostegui.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P9772 and previous config saved to /var/cache/conftool/dbconfig/20191127-193227-marostegui.json
  • 19:32 ebernhardson@deploy1001: Finished deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient (duration: 00m 45s)
  • 19:31 ebernhardson@deploy1001: Started deploy [search/airflow@45b7790]: Allow airflow virtualenv to import system site packages to facilitate libmysqlclient
  • 19:27 mutante: an-airflow1001 - apt-get install python3-mysqldb - start airflow-webserver
  • 19:24 ebernhardson@deploy1001: Finished deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package (duration: 00m 42s)
  • 19:23 ebernhardson@deploy1001: Started deploy [search/airflow@f3bad9d]: revert adding mysqlclient python package
  • 19:08 ebernhardson@deploy1001: Finished deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance (duration: 00m 40s)
  • 19:08 ebernhardson@deploy1001: Started deploy [search/airflow@57f4caa]: Install mysqlclient to airflow instance
  • 19:00 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg (it tries to write this on first start and did not have permissions to do so) T236180
  • 18:58 mutante: an-airflow1001: cd /etc/ ; chown airflow airflow; systemctl start airflow-webserver to let airflow write unittests.cfg
  • 18:57 eileen: process-control config revision is b95355c0c0 - repair omnirecipient job off
  • 16:57 andrewbogott: disabling puppet on clouvirt* and cloudcontrol* while merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552894/
  • 16:50 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external
  • 16:32 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: dd4c76d3d SpecialContributions: max concurrency 3 (instead of 10) T234450 (duration: 01m 17s)
  • 16:22 ejegg: shifted daily silverpop export start time one hour earlier
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P9768 and previous config saved to /var/cache/conftool/dbconfig/20191127-161525-marostegui.json
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P9767 and previous config saved to /var/cache/conftool/dbconfig/20191127-161450-marostegui.json
  • 16:06 ema: cp3050: set proxy.config.http.server_session_sharing.match to "ip" T238494
  • 15:57 _joe_: restarting pybal on lvs1015
  • 15:56 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:55 _joe_: restarting pybal on lvs1016
  • 15:52 jynus: disabling puppet on dbprov1001 to test bacula restore T238048
  • 15:47 papaul: testing redundancy power on scs-a1-codfw
  • 15:47 _joe_: restarting pybal on lvs2003
  • 15:44 _joe_: restarting pybal again on lvs2006
  • 15:42 jynus: migrate db entries of archive Media to backup1001 T238048
  • 15:37 marostegui: Logging retroactively for the record: drop user 'nova'@'%' from m5 - T239170
  • 15:30 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 marostegui: Add grants for dump (10.192.0.114,10.192.16.96) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:27 marostegui: Add grants for dump (10.64.0.95,10.64.16.31) for nova_cell0_eqiad database on db1117:3325 and db2078:3325 - T239170
  • 15:25 _joe_: restarting lvs2006 for addition of eventgate-logging-external,blubberoid-https
  • 15:24 moritzm: installing freetype bugfix updates from Buster 10.2 point release
  • 15:21 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=eventgate-logging-external
  • 15:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 moritzm: downgrading trapperkeeper-webserver-jetty9-clojure packages on puppetdb hosts to the version shipped in Buster 10.2
  • 15:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:02 moritzm: remove trapperkeeper-webserver-jetty9-clojure debs from apt.wikimedia.org/buster-wikimedia (these were needed to unbreak TLS on Puppetdb in Buster, but an update landed in Buster 10.2, which replaces our custom hotfix)
  • 14:56 marostegui: Add new grants for nova_cell0 database on m5 - T239170
  • 14:50 marostegui: Create nova_cell0 database on m5 master - T239170
  • 14:43 effie: reimage mw1346, mw1336, mw1326
  • 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:15 effie: reimage mw2285, mw2284, mw2283
  • 14:14 effie: reimage mw2285, mw2286, mw2283
  • 14:01 moritzm: temporarily stop cas on idp1001 for some failover tests
  • 14:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of testwikidatawiki to read from the new term store for items (T225057) (duration: 00m 56s)
  • 13:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:42 ema: cp1075: repool with tslua reloads enabled T233274
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:28 ema: cp1075: ats-{tls,backend} restarted to apply tslua reload changes T233274
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P9766 and previous config saved to /var/cache/conftool/dbconfig/20191127-132359-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9765 and previous config saved to /var/cache/conftool/dbconfig/20191127-132220-marostegui.json
  • 13:21 effie: reimage mw2288, mw2287, mw2286
  • 13:13 effie: reimage mw1348, mw1338, mw1328
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:51 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=nginx,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:50 jiji@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=apache2,cluster=api_appserver,name=mw2289.codfw.wmnet
  • 12:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 12:26 jiji@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 12:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:18 apergos: reimaged dumpsdata1001 to buster and forgot to use the dang script but it is all ok anyhow :-P
  • 11:47 Amir1: deployed security patch for T237667
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=nginx
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet,service=apache2
  • 11:28 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=nginx
  • 11:27 jiji@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet,service=apache2
  • 11:21 effie: reimage mw2289.codfw.wmnet
  • 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:06 ema: cp1075: depool to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552955/ and test tslua reloads T233274
  • 11:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:04 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 effie: reimage mw1347,mw1337,mw1327 - T239054
  • 10:32 ariel@deploy1001: Finished deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files (duration: 00m 03s)
  • 10:32 ariel@deploy1001: Started deploy [dumps/dumps@e0b0e76]: skip comment lines in dblist files
  • 09:41 moritzm: installing symfony security updates
  • 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 moritzm: installing php-imagick security updates
  • 09:25 ema: cp3050: re-enable request coalescing after performance experiment T238494
  • 09:02 effie: reimage mw1317.eqiad.wmnet - T239054
  • 09:01 marostegui: Stop replication on 1124:3318 to reimport wikidatawiki.page table on labsdb1010 - T238399
  • 08:24 godog: silence codfw varnish traffic drop until dec 9th - T239039
  • 08:09 godog: swift eqiad-prod: more weight to ms-be105[7-9] - T237438
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:53 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:51 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:49 elukey: roll restart of eventstreams on scb2* - T239220
  • 07:41 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:15 vgutierrez: repooling cp3063 - T239310
  • 07:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3063.esams.wmnet
  • 07:04 vgutierrez: depool & powercycle cp3063 - T239310
  • 07:03 marostegui: Compress tables on db1102:3314
  • 06:52 marostegui: Remove db2062 from tendril and zarcillo - T238726
  • 06:50 marostegui: Stop MySQL on db2062 - T238726
  • 06:25 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 06:05 marostegui: Promote db2135 to codfw m5 master T238183
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2135 to the config T238183 (duration: 00m 59s)
  • 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2135 to the config T238183 (duration: 01m 11s)
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2125 T239042', diff saved to https://phabricator.wikimedia.org/P9759 and previous config saved to /var/cache/conftool/dbconfig/20191127-054809-marostegui.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9758 and previous config saved to /var/cache/conftool/dbconfig/20191127-054056-marostegui.json
  • 01:58 krinkle@deploy1001: Synchronized vendor: 4108ff4e2 (3/3) (duration: 01m 00s)
  • 01:56 krinkle@deploy1001: Synchronized wmf-config/: 4108ff4e2 (2/3) (duration: 00m 59s)
  • 01:55 krinkle@deploy1001: Synchronized lib/: 4108ff4e2 (1/3) (duration: 01m 01s)
  • 01:28 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 03s)
  • 00:05 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Show UploadWizard CTA on testcommonswiki (T234960) (duration: 01m 00s)

2019-11-26

  • 23:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WelcomeSurvey for 100% of new users on arwiki (duration: 01m 02s)
  • 23:25 eileen: process-control config revision is ad80b0136c
  • 20:33 jforrester@deploy1001: Synchronized dblists/: Update dblists, now autogenerated (no-op, just comment changes) T223602 (duration: 01m 01s)
  • 20:25 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c282e86]: Followup on T230495 (duration: 00m 59s)
  • 20:24 ebernhardson@deploy1001: Finished deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3 (duration: 00m 42s)
  • 20:24 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c282e86]: Followup on T230495
  • 20:24 ebernhardson@deploy1001: Started deploy [search/airflow@c235ab5]: Rebuild environment for python 3.7.3
  • 20:06 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495 (duration: 01m 23s)
  • 20:05 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2b713d6]: Partition CirrusSearchElasticaWrite jobs T230495
  • 19:59 Pchelolo: create partitioned topics for cirrusSearchElasticaWrite on kafka-main T239135
  • 19:57 Urbanecm: Reset email of TheklanBot (T239233)
  • 19:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.8
  • 19:39 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache (duration: 32m 52s)
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P9753 and previous config saved to /var/cache/conftool/dbconfig/20191126-192724-marostegui.json
  • 19:22 shdubsh: restore codfw logstash to baseline - T215904
  • 19:09 shdubsh: stop logstash codfw, generate some consumer lag, and set batch size to 2000 - T215904
  • 19:07 ebernhardson@deploy1001: Finished deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml (duration: 00m 29s)
  • 19:07 ebernhardson@deploy1001: Started deploy [search/airflow@6ab2cd1]: Align deploy groups in scap.cfg and checks.yaml
  • 19:06 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.8 and rebuild l10n cache
  • 19:04 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.2 (duration: 07m 08s)
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 05s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 19:03 ebernhardson@deploy1001: Finished deploy [search/airflow@d9779a9]: redeploy current version (duration: 00m 02s)
  • 19:03 ebernhardson@deploy1001: Started deploy [search/airflow@d9779a9]: redeploy current version
  • 18:55 shdubsh: stop logstash codfw, generate some consumer lag - T215904
  • 18:44 shdubsh: temporarily update pipeline.batch.size to 1000 on logstash2004 - T215904
  • 18:33 shdubsh: stop logstash on logstash200[5-6] for metrics collection - T215904
  • 18:09 brennen: issues with branch.py branch cut; deleted stub wmf/1.35.0-wmf.8 branch and proceeding with standard process
  • 17:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Show UploadWizard CTA in beta (T234960) (duration: 00m 52s)
  • 17:31 brennen: cutting branch for 1.35.0-wmf.8
  • 17:26 paravoid: moving fiberring from cr3-esams:xe-0/0/2 to cr2-esams:xe-0/1/8
  • 17:25 ppchelko@deploy1001: Finished deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015 (duration: 15m 38s)
  • 17:10 ppchelko@deploy1001: Started deploy [restbase/deploy@0b74625]: Switch group 0 and 1 to Parsoid-PHP T229015
  • 17:03 paravoid: above was for cr3-esams
  • 17:03 paravoid: cr2-esams: disable interface xe-0/0/2 (transit)
  • 16:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop Scribunto special-case for HHVM, never reached T235142 (duration: 00m 52s)
  • 16:32 jforrester@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: Drop HHVMRequestInit symlink creation (duration: 00m 52s)
  • 16:31 James_F: No sane way to delete HHVMRequestInit.php with a simple sync-dir, so waiting for the full scap.
  • 16:30 jforrester@deploy1001: Synchronized docroot/noc/conf/: Drop HHVMRequestInit symlink (duration: 00m 52s)
  • 16:27 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a (duration: 08m 37s)
  • 16:19 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Update Parsoid to 7b9b424a
  • 16:10 ssastry@deploy1001: Finished deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685) (duration: 01m 07s)
  • 16:09 ssastry@deploy1001: Started deploy [parsoid/deploy@ee63341]: Testing rollback fixes (T238685)
  • 16:01 ema: cp3050: temporarily disable request coalescing to assess performance impact T238494
  • 15:15 ema: cp3050: repool after failed test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ (reverted) T238494
  • 14:55 bblack: ignore previous message, restarts not necessary
  • 14:53 bblack: rolling through authdns daemon restarts (necessary to reconfigure ANY-address listener) on authdns1001, authdns2001, ganeti3003
  • 14:44 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Raise memory limit on parsoid servers 2/2 (duration: 00m 52s)
  • 14:42 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Raise memory limit on parsoid servers 1/2 (duration: 00m 51s)
  • 14:30 oblivian@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 14:05 ema: cp3050: depool to merge and test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552862/ T238494
  • 13:11 effie: enable puppet on mediawiki servers
  • 13:03 effie: Remove tmpreaper package from all mediawiki servers - T229792
  • 12:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp (T238918) (duration: 00m 53s)
  • 12:07 XioNoX: power down mr1-esams for replacement - T238174
  • 11:36 elukey: reboot stat1007
  • 11:35 marostegui: Deploy schema change on db1139:3311
  • 11:35 effie: enable puppet on mw canary servers, and restart apaches
  • 10:50 hashar: Updated jenkins job operations-puppet-tests-stretch-docker to use latest Docker container
  • 10:30 godog: swift eqiad-prod: add ms-be105[7-9] - T237438
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9749 and previous config saved to /var/cache/conftool/dbconfig/20191126-102442-marostegui.json
  • 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:07 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:45 effie: Disable puppet on all mediawiki servers to test 489982
  • 09:26 marostegui: Deploy schema change on s8 primary master (db1109) - T234066 T233135 T237120
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into s8 vslow,dump', diff saved to https://phabricator.wikimedia.org/P9748 and previous config saved to /var/cache/conftool/dbconfig/20191126-092409-marostegui.json
  • 09:18 marostegui: Run maintain-views for wikidatawiki.protected_title view on labsdb hosts T233135
  • 07:53 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch Flow to Parsoid/PHP on mw.org -- T229015 (duration: 00m 52s)
  • 07:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266 (duration: 14m 24s)
  • 07:29 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504]: Do not use duplicate filter definitions T234266
  • 07:28 mobrovac@deploy1001: Finished deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions (duration: 07m 36s)
  • 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@378f504] (dev-cluster): Do not use duplicate filter definitions
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1061 from config - T238624', diff saved to https://phabricator.wikimedia.org/P9745 and previous config saved to /var/cache/conftool/dbconfig/20191126-071746-marostegui.json
  • 07:09 marostegui: Stop MySQL on db1061 - T238624
  • 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1061 from config T238624 (duration: 00m 52s)
  • 07:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1061 from config T238624 (duration: 00m 54s)
  • 06:51 marostegui: Run compare.py for db2125 - T239042
  • 06:44 marostegui: Remove triggers for ar_comment on db1124:3318 T234704
  • 06:43 marostegui: Deploy schema change on db1087 with replication, lag will be generated on s8 for labsdb hosts
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, and pool db1092 temporarily as vslow,dump for s8, for a schema change on db1087', diff saved to https://phabricator.wikimedia.org/P9744 and previous config saved to /var/cache/conftool/dbconfig/20191126-064200-marostegui.json
  • 06:34 XioNoX: Rename cr2-knams to cr3-knams - T237030
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1086 on s7 master and remove read-only from s7 T238044', diff saved to https://phabricator.wikimedia.org/P9743 and previous config saved to /var/cache/conftool/dbconfig/20191126-060108-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance T238044', diff saved to https://phabricator.wikimedia.org/P9742 and previous config saved to /var/cache/conftool/dbconfig/20191126-060023-marostegui.json
  • 06:00 marostegui: Starting s7 failover from db1062 to db1086 - T238044
  • 05:49 marostegui: Deploy schema change on dbstore1003:3311
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1086 as it will be the new s7 master - T238044', diff saved to https://phabricator.wikimedia.org/P9741 and previous config saved to /var/cache/conftool/dbconfig/20191126-051034-marostegui.json
  • 05:08 marostegui: Start pre-steps for s7 failover - T238044

2019-11-25

  • 23:39 cstone: payments-wiki revision changed from e4d51fe247 to 2eb54fd6ef
  • 23:14 Urbanecm: Evening SWAT done
  • 23:12 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 23:10 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 01s)
  • 23:09 urbanecm@deploy1001: Synchronized dblists/: SWAT: aed2369: Add gewikimedia to special.dblist (T239173) (duration: 00m 52s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: d71b0ab: kask-echoseen: Do not report dupes (T237143) (duration: 00m 53s)
  • 22:13 Jeff_Green: authdns update to deploy I21ddc1a3e
  • 22:04 eileen: civicrm revision changed from 852c4a36bd to 5cf2d2713f, config revision is c4ad2f5990
  • 20:37 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 20:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 20:07 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:05 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 20:04 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
  • 19:35 mutante: mw1298 - scap pull
  • 19:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 19:30 ema@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet,service=nginx
  • 19:14 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 19:13 cdanis: restarted grafana-server on grafana1002 T220838
  • 19:11 cdanis: copied snapshot of database from grafana1001 to grafana1002 T220838
  • 19:07 cdanis: stopping grafana-next.wikimedia.org (on grafana1002)
  • 19:06 cdanis: making grafana.wikimedia.org read-only (on grafana1001) ✔️ cdanis@grafana1001.eqiad.wmnet ~ 🕑☕ sudo chmod -w /var/lib/grafana/grafana.db
  • 18:56 Lucas_WMDE: Morning SWAT done
  • 18:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/TemplateData/: SWAT: Implement ParsoidFetchTemplateData hook for Parsoid/PHP (T238954) (duration: 00m 53s)
  • 18:54 bblack: cp[245]*: wipe daemon.log and syslog and restart syslog, again
  • 18:54 ema: cumin -b1 'A:cp-ats and A:esams' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:eqsin' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:53 ema: cumin -b1 'A:cp-ats and A:ulsfo' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:52 ema: cumin -b1 'A:cp-ats and A:codfw' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:51 ema: cumin -b1 'A:cp-ats and A:eqiad' 'run-puppet-agent; ats-backend-restart & ats-tls-restart'
  • 18:50 bblack: cp[245]*: wipe daemon.log and restart syslog, again
  • 18:48 mutante: mw1298 - pooling
  • 18:26 bblack: cp[245]*: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:17 bblack: cp4028: disk space exhausted, rm /var/log/daemon.log + restart rsyslog
  • 18:16 effie: Restart php-fpm on mw* and wtp* servers in eqiad and codfw - T236963
  • 18:07 effie: Upgrade php-wikidiff2 to 1.10.0 to all servers - T236963
  • 17:55 gehel: restart wdqs-updater on all wdqs servers
  • 17:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates (duration: 10m 24s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Parsoid: Switch private wiki clients (Flow, VE) to Parsoid/PHP -- T229015 (duration: 00m 53s)
  • 17:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: Revert New Blazegraph Build and WDQS Updates
  • 17:36 marostegui: Upgrade kernel on db2125 T239042
  • 17:25 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates (duration: 12m 23s)
  • 17:19 XioNoX: power down cr2-knams - T237030
  • 17:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa (duration: 08m 58s)
  • 17:12 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c5f503]: New Blazegraph Build and WDQS Updates
  • 17:05 arlolra@deploy1001: Started deploy [parsoid/deploy@e7faa19]: Updating Parsoid to a6bfdfa
  • 16:48 jynus: upgrading and restarting dbprov* hosts
  • 15:49 ema: pool cp3064 with varnish-be T227432
  • 15:36 ema: cp3064 create filesystem on /dev/nvme0n1p1 (see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/552547/) and reboot T238494
  • 15:22 ema: cp3064 manual reboot after wmf-auto-reimage error: 'Unable to run wmf-auto-reimage-host: Failed to reboot_host' T238494
  • 15:20 ema: cp-ats: rolling ats-{tls,backend} restart to enable lua reload T233274
  • 15:18 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:14 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 ema: cp1075: ats-tls-restart to enable lua reload T233274
  • 15:10 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:09 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 ema: cp1075: ats-backend-restart to enable lua reload T233274
  • 15:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 15:00 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp3056.esams.wmnet,service=ats-be
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:50 XioNoX: enable cr3-esams:et-1/0/0 - T236767
  • 14:45 ema: depool cp3064 and reimage with varnish-be T227432
  • 14:44 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 14:38 marostegui: Remove triggers from archive table on s1 codfw sanitarium T234704
  • 14:37 marostegui: Deploy schema change on s1 codfw (this will generate lag on codfw) - T234066 T233135
  • 14:23 moritzm: upgrading OpenJDK 11 on an-conf*
  • 14:04 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 elukey: set global read_only=1 on db1108's log database - T159170
  • 13:16 XioNoX: cleanup config on cr3-esams - T237031
  • 13:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:11 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 13:06 XioNoX: cleanup config on cr2-esams - T237031
  • 13:02 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:59 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 12:48 XioNoX: bundle esams-knams links on knams side - T237031
  • 12:42 XioNoX: bundle esams-knams links on esams side - T237031
  • 12:27 XioNoX: disable BGP to knams transits - T237031
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Increase main traffic weight for db1126', diff saved to https://phabricator.wikimedia.org/P9735 and previous config saved to /var/cache/conftool/dbconfig/20191125-114821-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P9734 and previous config saved to /var/cache/conftool/dbconfig/20191125-114733-marostegui.json
  • 11:40 effie: cumin -b 2 -s 10 restart php on API servers
  • 11:31 effie: restart php-fpm on mw1314
  • 11:16 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/AbuseFilter/extension.json: SWAT: 29a16bd: Restrict viewing Special:Log/AbuseFilter, and remove from recent changes (T34959) (duration: 01m 04s)
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 4670d1d: Add throttle rule for WMCL Editathon 2019-12-07 (T238986) (duration: 00m 53s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9394f1f: Allow enwikiversity interface admins to remove their own interface administratorship (T238967) (duration: 00m 57s)
  • 09:45 moritzm: installing cron updates from buster point release
  • 09:32 moritzm: installing systemd security/bugfix updates on buster
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - schema change', diff saved to https://phabricator.wikimedia.org/P9732 and previous config saved to /var/cache/conftool/dbconfig/20191125-093157-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P9731 and previous config saved to /var/cache/conftool/dbconfig/20191125-093038-marostegui.json
  • 09:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: T238822 (duration: 13m 08s)
  • 09:28 _joe_: building and publishing updated images for envoy
  • 09:17 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: T238822
  • 09:13 moritzm: installing python2.7 updates on buster
  • 08:53 _joe_: rebuilding base docker images docker-registry.wikimedia.org/wikimedia-{jessie,stretch,buster}
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 marostegui: Compress db2090
  • 07:04 marostegui: Upgrade db2134
  • 06:24 marostegui: Compress db2080
  • 06:23 marostegui: Compress db2082
  • 06:22 marostegui: Compress db2094:3318
  • 06:18 marostegui: racadm serveraction hardreset on db2125 T239042
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 - schema change', diff saved to https://phabricator.wikimedia.org/P9730 and previous config saved to /var/cache/conftool/dbconfig/20191125-061629-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9729 and previous config saved to /var/cache/conftool/dbconfig/20191125-061542-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9728 and previous config saved to /var/cache/conftool/dbconfig/20191125-060728-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9727 and previous config saved to /var/cache/conftool/dbconfig/20191125-060011-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed T239042', diff saved to https://phabricator.wikimedia.org/P9726 and previous config saved to /var/cache/conftool/dbconfig/20191125-055813-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P9725 and previous config saved to /var/cache/conftool/dbconfig/20191125-055305-marostegui.json
  • 03:13 vgutierrez: repooling cp3053 - T239041
  • 03:00 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3053.esams.wmnet
  • 02:59 vgutierrez: depooling & power-cycling cp3053 - T239041
  • 00:10 eileen: also speed the repair process-control config revision is c4ad2f5990

2019-11-24

  • 20:54 eileen: process-control config revision is 371782a667
  • 15:41 ariel@deploy1001: Finished deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps (duration: 00m 03s)
  • 15:41 ariel@deploy1001: Started deploy [dumps/dumps@bfdea34]: can skip locks for misc dumps
  • 15:01 apergos: rebooting dumpsdata1002 to clear up the other half of the nfs issues
  • 14:24 apergos: rebooting snapshot1008 to clear up some nfs + kernel issues

2019-11-23

  • 18:19 gehel: repool wdqs1007, catched up on lag - T238229
  • 14:23 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 55s)
  • 11:56 _joe_: oblivian@cumin1001:~$ sudo cumin -b2 -s60 A:mw-eqiad 'restart-php7.2-fpm'
  • 11:47 _joe_: restarting php7.2-fpm on mw1329
  • 09:49 XioNoX: downtime all ripe-atlas checks until Monday (most likely an upstream issue/maintenance)

2019-11-22

  • 21:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238955 (duration: 00m 53s)
  • 18:02 shdubsh: restore prometheus services default settings - T238807
  • 17:52 _joe_: repooling restbase2018
  • 17:36 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:34 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 shdubsh: clean tombstones on prometheus1004 - T238807
  • 17:09 shdubsh: restart prometheus on prometheus1004 - T238807
  • 16:22 shdubsh: clean tombstones on prometheus1003 - T238807
  • 15:40 XioNoX: renumber AS17639 sessions in eqsin
  • 15:16 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/: Stop outputting anything in case of 304 responses in Special:EntityData (T238901) (duration: 00m 57s)
  • 14:49 _joe_: disabling puppet on restbase2018, testing envoy upgrade T238050
  • 14:48 _joe_: uploaded envoyproxy 1.12.1 to {buster,stretch} T237235
  • 13:11 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T238119 T238524 T237375 T238120)
  • 13:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: T238473 (duration: 00m 52s)
  • 12:34 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)
  • 12:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)
  • 11:59 effie: reload php7 on canaries
  • 11:34 effie: Roll out wikidiff2 1.10.0-1 to canaries - T236963
  • 11:29 effie: upload wikidiff2 1.10.0-1 - T236963
  • 09:59 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 10s)
  • 09:56 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 51s)
  • 09:47 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
  • 09:44 ladsgroup@deploy1001: Synchronized langlist: T238104 T238104 (duration: 00m 52s)
  • 09:28 ema: pool cp1081 with ATS backend T227432
  • 09:27 gehel: depool wdqs1007 to allow to catch up on lag - T238229
  • 09:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/includes/specials/pagers/ContribsPager.php: Remove live hack of limit for T234450 (duration: 00m 54s)
  • 09:19 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T234450 (duration: 00m 55s)
  • 09:07 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:04 gehel: remove blazegraph 2.1.5-wmf.11 from archiva, broken upload
  • 08:54 gehel: restarting blazegraph and updater on wdqs1007
  • 08:54 gehel: restarting blazegraph and updater on edqs1007
  • 08:49 ema: depool cp1081 and reimage as text_ats T227432
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Rebalance weights on s7 in preparation for s7 failover on Tuesday T238044', diff saved to https://phabricator.wikimedia.org/P9722 and previous config saved to /var/cache/conftool/dbconfig/20191122-063145-marostegui.json
  • 03:49 shdubsh: restart prometheus@ops on prometheus1003 T238807
  • 00:46 mutante: xhgui1001/xhgui2001 - rsyncing /srv/mongod from tungsten to /srv/tungsten/mongod/ on both new machines (T158837)
  • 00:37 mutante: tungsten - starting ferm service
  • 00:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move newcomer tasks JSON config from mw.org to local wikis (T237301) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Make non-remote titles work in RemotePageConfigurationLoader (T237301) (duration: 00m 54s)

2019-11-21

  • 23:09 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused CirrusSearch config variable (duration: 00m 52s)
  • 22:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --overwrite --user=Bürgerentscheid . (T238764)
  • 21:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Revert "Add Machine Vision CTA to final step (T234960)", take 2 (duration: 00m 41s)
  • 21:36 mholloway-shell@deploy1001: Scap failed!: 5/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:34 mholloway-shell@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 21:29 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/UploadWizard: Add Machine Vision CTA to final step (T234960) (duration: 00m 59s)
  • 21:16 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88 (duration: 06m 29s)
  • 21:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@70154b4]: Update mobileapps to c140e88
  • 20:51 mutante: puppetmaster1001 - revoking puppet certs for xhgui1001/xhgui2001
  • 20:49 mutante: ganeti1003 - switching boot order of xhgui1001 to network and reinstalling with stretch (T238098)
  • 20:16 mforns@deploy1001: Finished deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist (duration: 08m 29s)
  • 20:14 mutante: icinga1001 - systemctl reset-failed
  • 20:08 mforns@deploy1001: Started deploy [analytics/refinery@97015e4]: add new projects to webrequest whitelist
  • 19:01 andrewbogott: upgrading designate to 'ocata' on cloudservices1003 and 1004
  • 18:49 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:45 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:13 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis back to Parsoid/JS - T229015 (duration: 00m 52s)
  • 18:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:02 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Use HTTPS for contacting Parsoid/PHP - T229015 (duration: 00m 53s)
  • 17:52 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Switch private wikis to Parsoid/PHP; file 4/4 -- T229015 (duration: 00m 53s)
  • 17:51 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch private wikis to Parsoid/PHP; file 3/4 -- T229015 (duration: 00m 51s)
  • 17:50 mobrovac@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch private wikis to Parsoid/PHP; file 2/4 -- T229015 (duration: 00m 53s)
  • 17:48 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: Switch private wikis to Parsoid/PHP; file 1/4 -- T229015 (duration: 00m 53s)
  • 17:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015 (duration: 16m 43s)
  • 17:10 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068]: Switch mw.org to Parsoid/PHP - T229015
  • 17:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP (duration: 02m 38s)
  • 17:06 mobrovac@deploy1001: Started deploy [restbase/deploy@b987068] (dev-cluster): Switch mw.org to Parsoid/PHP
  • 16:54 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 16:48 sbassett@deploy1001: Finished scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues. (duration: 16m 42s)
  • 16:31 sbassett@deploy1001: Started scap: Deploying T238451 (ext:AbuseFilter), running scap sync for i18n issues.
  • 15:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 15:42 mforns@deploy1001: Finished deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107) (duration: 10m 50s)
  • 15:31 mforns@deploy1001: Started deploy [analytics/refinery@7f32472]: deploying analytics refinery (after refinery-source v0.0.107)
  • 15:30 ema: pool cp1079 with ATS backend T227432
  • 15:22 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:19 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:13 akosiaris: purge https://releases.wikimedia.org/charts/eventgate-0.0.13.tgz, https://releases.wikimedia.org/charts/ and https://releases.wikimedia.org/charts/index.yaml
  • 15:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 bblack: DONE testing deployment software changes on authdns cluster, back to normal
  • 15:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 ema: depool cp1079 and reimage as text_ats T227432
  • 14:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@db43901]: Agent filter changes (duration: 18m 33s)
  • 14:43 bblack: testing deployment software changes on authdns cluster, please hold dns changes for a few!
  • 14:41 thcipriani: restarting Jenkins for update
  • 14:28 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@db43901]: Agent filter changes
  • 13:59 ema: pool cp1077 with ATS backend T227432
  • 13:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:39 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:20 ema: depool cp1077 and reimage as text_ats T227432
  • 11:53 reedy@deploy1001: Finished scap: T234450 (duration: 19m 20s)
  • 11:42 effie: enable puppet on all mw hosts
  • 11:33 reedy@deploy1001: Started scap: T234450
  • 11:09 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e4861ec: Set correct language for shywiktionary (T238105) (duration: 00m 52s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 68d2003: Restrict editing CNBanner namespace to autoconfirmed on metawiki (T238723) (duration: 00m 54s)
  • 11:05 effie: disable puppet on mw[1-2]*
  • 10:49 volans: restarting tcpircbot-logmsgbot on icinga1001, has failed to log some messages, no useful log on the host
  • 10:22 ema: pool cp2023 with Varnish backend T238817 T227432
  • 10:18 arturo: update buster-wikimedia thirdparty/kubeadm-k8s packages (newer version will be used to handle T238654)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9714 and previous config saved to /var/cache/conftool/dbconfig/20191121-095401-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1090:331{2,7} after upgrade', diff saved to https://phabricator.wikimedia.org/P9713 and previous config saved to /var/cache/conftool/dbconfig/20191121-093958-marostegui.json
  • 09:39 ema: depool cp2023 and reimage back as varnish-be T238817 T227432
  • 09:38 marostegui: Stop MySQL on db1067 - T238297
  • 09:27 marostegui: Upgrade db1090:3312, db1090:3317
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P9712 and previous config saved to /var/cache/conftool/dbconfig/20191121-092554-marostegui.json
  • 09:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9711 and previous config saved to /var/cache/conftool/dbconfig/20191121-090623-marostegui.json
  • 09:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 08:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9710 and previous config saved to /var/cache/conftool/dbconfig/20191121-085644-marostegui.json
  • 08:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9709 and previous config saved to /var/cache/conftool/dbconfig/20191121-084500-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 after upgrade', diff saved to https://phabricator.wikimedia.org/P9708 and previous config saved to /var/cache/conftool/dbconfig/20191121-083322-marostegui.json
  • 08:21 marostegui: Upgrade db1079
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for upgrade', diff saved to https://phabricator.wikimedia.org/P9707 and previous config saved to /var/cache/conftool/dbconfig/20191121-082108-marostegui.json
  • 07:57 akosiaris: upgrade OTRS to 5.0.39 T225925
  • 07:56 marostegui: Promote db2133 to codfw m2 master - T238183
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9706 and previous config saved to /var/cache/conftool/dbconfig/20191121-072543-marostegui.json
  • 07:18 marostegui: Upgrade db1125 (sanitarium)
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9705 and previous config saved to /var/cache/conftool/dbconfig/20191121-071758-marostegui.json
  • 06:56 marostegui: Repool labsdb1009
  • 06:32 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db1124:3313 T238115 T238114 T237373 T238522 T236404
  • 06:30 marostegui: Sanitize shywiktionary gcrwiki szywiki minwiktionary gewikimedia on db2094:3313 T238115 T238114 T237373 T238522 T236404
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9704 and previous config saved to /var/cache/conftool/dbconfig/20191121-062412-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after upgrade', diff saved to https://phabricator.wikimedia.org/P9703 and previous config saved to /var/cache/conftool/dbconfig/20191121-061711-marostegui.json
  • 06:16 marostegui: Compress db2081
  • 06:13 marostegui: Stop MySQL on db1107 T238113
  • 06:06 marostegui: Compress db2083
  • 05:57 marostegui: Depool labsdb1009 for upgrade
  • 05:56 marostegui: Upgrade db1086
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for upgrade', diff saved to https://phabricator.wikimedia.org/P9702 and previous config saved to /var/cache/conftool/dbconfig/20191121-055557-marostegui.json
  • 05:53 marostegui: Compress db2073
  • 00:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config does not seem to be applying on half the app servers, resyncing (duration: 00m 52s)
  • 00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable suggested edits without opt-in (T227728) (duration: 00m 52s)
  • 00:18 catrope@deploy1001: Finished scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n) (duration: 15m 57s)
  • 00:02 catrope@deploy1001: Started scap: GrowthExperiments and MobileFrontend changes SWAT (includes i18n)

2019-11-20

  • 23:14 Amir1: finished creating five wikis, total duration 134 minutes
  • 23:14 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
  • 23:11 ladsgroup@deploy1001: Synchronized langlist: T238105 (duration: 00m 50s)
  • 23:10 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238105 (duration: 00m 52s)
  • 23:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238105 (duration: 00m 51s)
  • 23:08 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238105 (duration: 00m 51s)
  • 23:05 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238105
  • 22:59 ladsgroup@deploy1001: Synchronized dblists: T238105 (duration: 00m 53s)
  • 22:49 ladsgroup@deploy1001: Synchronized langlist: T238104 (duration: 00m 51s)
  • 22:48 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T238104 (duration: 00m 52s)
  • 22:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238104 (duration: 00m 52s)
  • 22:43 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T238104 (duration: 00m 51s)
  • 22:41 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T238104
  • 22:36 ladsgroup@deploy1001: Synchronized dblists: T238104 (duration: 00m 52s)
  • 22:22 ladsgroup@deploy1001: Synchronized langlist: T237369 (duration: 00m 53s)
  • 22:21 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T237369 (duration: 00m 52s)
  • 22:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T237369 (duration: 00m 51s)
  • 22:17 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T237369 (duration: 00m 51s)
  • 22:15 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T237369
  • 22:11 ladsgroup@deploy1001: Synchronized dblists: T237369 (duration: 00m 52s)
  • 22:00 Urbanecm: Wiki creation continues
  • 21:56 ladsgroup@deploy1001: Synchronized langlist: T236861 (duration: 00m 52s)
  • 21:55 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T236861 (duration: 00m 51s)
  • 21:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236861 (duration: 00m 52s)
  • 21:52 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T236861 (duration: 00m 51s)
  • 21:49 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T236861
  • 21:44 ladsgroup@deploy1001: Synchronized dblists: T236861 (duration: 00m 52s)
  • 21:38 Urbanecm: mwscript createAndPromote.php --wiki=gewikimedia --sysop --bureaucrat Mehman97 <password redacted> (T236389)
  • 21:35 gehel: repool wdqs1004 - T238229
  • 21:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:29 urbanecm@deploy1001: Synchronized static/images/project-logos/: new wiki gewikimedia (T236389) (duration: 00m 53s)
  • 21:28 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:27 ejegg: Fundraising CiviCRM updated from 2802bdd649 to 852c4a36bd
  • 21:23 mutante: notebook1003 - systemctl start nagios-nrpe-server (second time today already today T212824)
  • 21:20 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: new wiki gewikimedia (T236389)
  • 21:16 urbanecm@deploy1001: Synchronized dblists: new wiki gewikimedia (T236389) (duration: 00m 52s)
  • 21:01 ssastry@deploy1001: Finished deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix (duration: 07m 20s)
  • 20:53 ssastry@deploy1001: Started deploy [parsoid/deploy@7665624]: Dummy Parsoid deploy to test T238748 fix
  • 20:37 ssastry@deploy1001: Finished deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d (duration: 09m 14s)
  • 20:27 ssastry@deploy1001: Started deploy [parsoid/deploy@d5646b7]: Updating Parsoid to 2e79460d
  • 20:27 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 20:23 mutante: notebook1003 - sudo systemctl nagios-nrpe-server (as usual ....)
  • 20:19 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:31 ejegg: updated fundraising internal dashboard from 69fdbec60d to 8fc2726736
  • 19:04 mutante: xhgui1001 - initial puppet run, signed puppet cert on puppetmaster1001
  • 18:56 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 50s)
  • 18:51 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 54s)
  • 18:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 170 (duration: 00m 53s)
  • 18:31 mutante: ganeti - introducing and installing buster on new VMs xhgui1001/xhgui2001 - for replacing tungsten (jessie) T238098
  • 18:17 mobrovac: morning SWAT done
  • 18:17 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.5/includes/libs/virtualrest/ParsoidVirtualRESTService.php: Parsoid VRS: Add the Host header - T229015 T229078 T229074 (duration: 00m 52s)
  • 18:13 shdubsh: restart mtail on fermium
  • 17:40 ema: pool cp2023 with ATS backend T227432
  • 17:24 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:21 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 17:19 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 17:18 andrewbogott: upgrading pdns to version 4 on cloudservices1003
  • 17:06 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:04 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:03 andrewbogott: upgrading pdns to version 4 on cloudvirt1004 T210715
  • 16:58 andrewbogott: disabling puppet on cloudvirt1003 and 1004 for T210715
  • 16:55 moritzm: installing rpcbind bugfix updates from buster 10.2 point release
  • 16:43 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:23 ema: depool cp2023 and reimage as text_ats T227432
  • 16:14 ema: pool cp2019 with ATS backend T227432
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9695 and previous config saved to /var/cache/conftool/dbconfig/20191120-160813-marostegui.json
  • 16:03 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 15:42 mobrovac@deploy1001: Synchronized wmf-config/LabsServices.php: [BETA-ONLY] Switch Flow to use Parsoid/PHP - T229078 (duration: 00m 52s)
  • 15:40 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:38 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 15:19 ema: depool cp2019 and reimage as text_ats T227432
  • 15:08 gehel: reset LVS weight for wdqs public eqiad to 10
  • 15:05 effie: Enable puppet on mw*
  • 14:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 gerrit:552069 (duration: 00m 52s)
  • 14:50 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (use altered lag, not raw lag) gerrit:552072 (duration: 00m 53s)
  • 14:49 ema: pool cp2016 with ATS backend T227432
  • 14:47 effie: disable puppet on all mw* servers
  • 14:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:06 ema: depool cp2016 and reimage as text_ats T227432
  • 13:32 godog: updated puppet compiler facts on compiler100* hosts
  • 12:43 ema: pool cp2013 with ATS backend T227432
  • 12:27 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:08 ema: depool cp2013 and reimage as text_ats T227432
  • 11:59 ema: pool cp2012 with ATS backend T227432
  • 11:55 Urbanecm: EU SWAT done
  • 11:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2b13fbe: [rowiki] Enable deleterevision for patrollers (T234051) (duration: 00m 52s)
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 51ecd71: Partial cleanup of InitializeSettings (T231178) (duration: 00m 52s)
  • 11:42 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:40 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f847380: Set namespace alias for Index: (NS 102/103) for elwikisource (T237253) (duration: 00m 54s)
  • 11:36 urbanecm@deploy1001: Finished scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes (duration: 06m 15s)
  • 11:30 urbanecm@deploy1001: Started scap: SWAT: 44ec4e4: e1baf0e: 3c02aa7: Namespace changes
  • 11:27 ema: cp2010: ats-backend-restart to clear backend restart alert
  • 11:21 ema: depool cp2012 and reimage as text_ats T227432
  • 11:15 ema: pool cp2010 with ATS backend T227432
  • 10:54 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716 (duration: 13m 56s)
  • 10:34 ema: depool cp2010 and reimage as text_ats T227432
  • 10:30 marostegui: Upgrade db1116
  • 10:22 mobrovac@deploy1001: Started deploy [restbase/deploy@daa7808]: Revert switching test2.wp to Parsoid/JS - T238716
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P9694 and previous config saved to /var/cache/conftool/dbconfig/20191120-101727-marostegui.json
  • 10:14 marostegui: Compress db2095:3314
  • 10:07 mobrovac@deploy1001: Finished deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716 (duration: 14m 54s)
  • 09:56 marostegui: Compress db2106
  • 09:52 mobrovac@deploy1001: Started deploy [restbase/deploy@c677063]: Switch test2.wp back to Parsoid/JS temporarily - T238716
  • 09:48 marostegui: Compress dbstore1005:3318
  • 09:47 marostegui: Compress dbstore1004:3314
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9693 and previous config saved to /var/cache/conftool/dbconfig/20191120-093308-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9692 and previous config saved to /var/cache/conftool/dbconfig/20191120-092337-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 after upgrade', diff saved to https://phabricator.wikimedia.org/P9691 and previous config saved to /var/cache/conftool/dbconfig/20191120-090739-marostegui.json
  • 08:55 marostegui: Upgrade db1094
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for upgrade', diff saved to https://phabricator.wikimedia.org/P9690 and previous config saved to /var/cache/conftool/dbconfig/20191120-085448-marostegui.json
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:43 marostegui: Promote db2132 as m1-codfw master - T238183
  • 07:19 marostegui: Upgrade db2062
  • 07:19 marostegui: Upgrade db2078
  • 07:14 marostegui: Deploy schema change on s3 (testwikidatawiki) directly on s3 primary master T237120
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P9688 and previous config saved to /var/cache/conftool/dbconfig/20191120-070511-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1136', diff saved to https://phabricator.wikimedia.org/P9687 and previous config saved to /var/cache/conftool/dbconfig/20191120-065718-marostegui.json
  • 06:44 marostegui: Upgrade db2118 (s7 codfw master)
  • 06:41 marostegui: Repool labsdb1011
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1136 into s7 api', diff saved to https://phabricator.wikimedia.org/P9686 and previous config saved to /var/cache/conftool/dbconfig/20191120-064022-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136 after upgrade', diff saved to https://phabricator.wikimedia.org/P9685 and previous config saved to /var/cache/conftool/dbconfig/20191120-063628-marostegui.json
  • 06:28 marostegui: Upgrade db1136
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for upgrade', diff saved to https://phabricator.wikimedia.org/P9684 and previous config saved to /var/cache/conftool/dbconfig/20191120-062749-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after upgrade', diff saved to https://phabricator.wikimedia.org/P9683 and previous config saved to /var/cache/conftool/dbconfig/20191120-062029-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9682 and previous config saved to /var/cache/conftool/dbconfig/20191120-061938-marostegui.json
  • 05:58 marostegui: Stop MySQL on db1101:3317, db1101:3318 for upgrade and schema change
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for upgrade and schema change', diff saved to https://phabricator.wikimedia.org/P9681 and previous config saved to /var/cache/conftool/dbconfig/20191120-055732-marostegui.json
  • 05:55 marostegui: Depool labsdb1011 for upgrade
  • 05:54 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1105:3311 db1097:3314 db1098:3316 db1098:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9680 and previous config saved to /var/cache/conftool/dbconfig/20191120-055426-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P9679 and previous config saved to /var/cache/conftool/dbconfig/20191120-054840-marostegui.json
  • 03:16 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php kowiki --cutoff 350
  • 02:57 vgutierrez: restarting pybal on lvs2002
  • 02:54 vgutierrez: restarting pybal on lvs2005
  • 02:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 02:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 00:10 mutante: phab2001 - restart ssh-phab service after repooling it after buster reinstall, it wasn't listening on the IPv6 IP,causing LVS/pybal alerts
  • 00:06 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Pass token as editing_session_id for suggested edits (T238249) (duration: 00m 53s)
  • 00:02 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 52s)
  • 00:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikiEditor/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 54s)

2019-11-19

  • 23:58 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MobileFrontend/: EditAttemptStep: Allow overriding session ID (T238249) (duration: 00m 53s)
  • 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/WikimediaEvents/: EditAttemptStep: Allow other extensions to trigger oversampling (T238249) (duration: 00m 53s)
  • 23:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 21:45 XioNoX: rebooting pfw3-codfw:node1 for upgrade - T235150
  • 21:14 XioNoX: rebooting pfw3-codfw for upgrade - T235150
  • 20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:17 gehel: completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 20:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:10 XioNoX: homer push on mgmt routers
  • 20:09 mutante: phab1003 after merging gerrit:551910 puppet now also stopped the actual aphlict service and removed the systemd unit file. had to manually run 'systemctl reset-failed' though to clean systemd status and avoid icinga alert (T238593)
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:18 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 19:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286) (duration: 06m 49s)
  • 19:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@6e6bd42]: Prevent expensive content transforms from blocking the event loop (T229286)
  • 19:00 elukey: regenerate TLS cert for yarn.wikimedia.org (containing SANs for all analytics UIs) to add datasets.w.o SAN (site was failing due to ATS not being able to contact thorium)
  • 18:59 rlazarus: restarted php7.2-fpm on wtp2001, wtp2002
  • 18:56 rlazarus: restarted php7.2-fpm on wtp1025, wtp1026
  • 18:35 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/: Unbreak instrumentation of init events (duration: 00m 53s)
  • 18:34 ssastry@deploy1001: Finished deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7 (duration: 02m 04s)
  • 18:32 ssastry@deploy1001: Started deploy [parsoid/deploy@6e7cffd]: Updating Parsoid to 1a1105a7
  • 18:30 mutante: icinga config - manually added team-dcops, started icinga
  • 18:20 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, hook) gerrit:551858 (duration: 00m 53s)
  • 18:12 RoanKattouw: That was eowiktionary, not eowikisource
  • 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure default search namespaces for eowikisource (T237792) (duration: 00m 52s)
  • 17:43 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, maint script) gerrit:551857 (duration: 00m 52s)
  • 17:39 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:11 addshore@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag) gerrit:551855 gerrit:551856 (duration: 00m 54s)
  • 17:02 volker-e@deploy1001: Finished deploy [design/style-guide@d73818a]: Deploy design/style-guide: (duration: 00m 07s)
  • 17:02 volker-e@deploy1001: Started deploy [design/style-guide@d73818a]: Deploy design/style-guide:
  • 16:58 ema: pool cp2007 with ATS backend T227432
  • 16:30 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:28 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 moritzm: installing glib2.0 security updates
  • 16:21 mutante: phab1003 - puppet restarts aphlict service even with "phabricator_aphlict_enabled: false" in Hiera. But it does properly remove the proxy config lines from apache. so service is running but not used. (T238593)
  • 16:17 mutante: phab1003 - systemctl stop aphlict (proxy config in apache is disabled as well as disabled in ATS) (T238593)
  • 16:15 gehel: reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826
  • 16:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:10 ema: depool cp2007 and reimage as text_ats T227432
  • 16:09 ema: pool cp2006 with ATS backend T227432
  • 15:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure (duration: 02m 11s)
  • 15:57 mobrovac@deploy1001: Started deploy [restbase/deploy@564b2c6]: New Parsoid/PHP config structure
  • 15:37 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:34 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015 (duration: 14m 22s)
  • 15:15 ema: depool cp2006 and reimage as text_ats T227432
  • 15:13 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759]: Switch test.wp and test2.wp to Parsoid/PHP - T229015
  • 15:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP (duration: 02m 58s)
  • 15:07 ema: pool cp2004 with ATS backend T227432
  • 15:06 mobrovac@deploy1001: Started deploy [restbase/deploy@5e7f759] (dev-cluster): Switch test.wp and test2.wp to Parsoid/PHP
  • 14:38 ema@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:34 gehel: restarting blazegraph with additional logging on wdqs1004 - T231411
  • 14:18 ema: depool cp2004 and reimage as text_ats T227432
  • 14:13 ema: pool cp2001 with ATS backend T227432
  • 13:57 marostegui: Deploy schema change on metawiki directly on s7 master T238370
  • 13:57 marostegui: Deploy schema change on mediawikiwiki directly on s7 master T238370
  • 13:55 marostegui: Deploy schema change on mediawikiwiki directly on s3 master T238370
  • 13:50 marostegui: Deploy schema change on foundationwiki directly on s3 master - T238370
  • 13:46 marostegui: Deploy schema change on labswiki (wikitech) - T238370
  • 13:39 marostegui: Deploy schema change on db1092
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P9673 and previous config saved to /var/cache/conftool/dbconfig/20191119-133850-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9672 and previous config saved to /var/cache/conftool/dbconfig/20191119-133704-marostegui.json
  • 13:34 ema@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:33 ema@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:14 ema: depool cp2001 and reimage as text_ats T227432
  • 12:42 jbond42: add libapache2-mod-auth-cas 1.2-1 to stretch-wikimedia repo
  • 12:28 effie: enable puppet on P:mediawiki::php and *.eqiad.wmnet
  • 12:22 effie: enable puppet on P:mediawiki::php and *.codfw.wmnet
  • 12:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 12:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1067 from config T238297 (duration: 00m 52s)
  • 11:41 gehel: depooling wdqs1004 - T231411
  • 11:37 gehel: restarting wdqs blazegraph on wdqs1004 - T231411
  • 11:29 marostegui: Upgrade dbstore1003 (3311,3315,3317)
  • 11:16 gehel: restarting wdqs updater on wdqs1004 - T231411
  • 10:36 marostegui: Compress and upgrade db1098:3316
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9671 and previous config saved to /var/cache/conftool/dbconfig/20191119-103540-marostegui.json
  • 10:34 marostegui: Compress and upgrade db1098:3317
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for upgrade and compression', diff saved to https://phabricator.wikimedia.org/P9670 and previous config saved to /var/cache/conftool/dbconfig/20191119-103426-marostegui.json
  • 10:29 marostegui: Upgrade db2077
  • 10:24 marostegui: Upgrade db2120 db2121 db2122
  • 10:10 marostegui: Upgrade MySQL on db2086 db2087 db2100
  • 10:06 godog: repool centrallog2001
  • 09:40 effie: disable puppet on P:mediawiki::php - T229792
  • 09:21 moritzm: installing ncurses security updates
  • 09:20 moritzm: rolling restart of nginx on acmechief/puppetdb to pick up libxslt security updates
  • 09:08 moritzm: installing libxslt security updates
  • 09:08 marostegui: Deploy schema change on db1101:3318
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9669 and previous config saved to /var/cache/conftool/dbconfig/20191119-090823-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9668 and previous config saved to /var/cache/conftool/dbconfig/20191119-090745-marostegui.json
  • 09:05 marostegui: Repool labsbdb1010
  • 07:33 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Enable math links in Beta - T208758 (duration: 00m 53s)
  • 06:45 marostegui: Stop MySQL on db2061 T238526
  • 06:44 marostegui: Remove db2061 from tendril and zarcillo T238526
  • 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2061 from config T238526 (duration: 00m 52s)
  • 06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2061 from config T238526 (duration: 00m 53s)
  • 06:26 vgutierrez: Move cp1089 from nginx to ats-tls - T231627
  • 06:20 marostegui: Depool labsdb1010 for upgrade
  • 06:02 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1131 to s6 master and remove read-only from s6 T235469', diff saved to https://phabricator.wikimedia.org/P9667 and previous config saved to /var/cache/conftool/dbconfig/20191119-060203-marostegui.json
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance T235469', diff saved to https://phabricator.wikimedia.org/P9666 and previous config saved to /var/cache/conftool/dbconfig/20191119-060122-marostegui.json
  • 06:01 marostegui: Starting s6 failover from db1061 to db1131 - T235469
  • 05:37 eileen: process control - I reverted the above to check some stuff first
  • 05:36 vgutierrez: Move cp1087 from nginx to ats-tls - T231627
  • 05:26 marostegui: Deploy schema change on db1099:3318
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P9665 and previous config saved to /var/cache/conftool/dbconfig/20191119-052632-marostegui.json
  • 05:25 marostegui: Compress db1097:3314
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for compression', diff saved to https://phabricator.wikimedia.org/P9664 and previous config saved to /var/cache/conftool/dbconfig/20191119-052412-marostegui.json
  • 05:17 vgutierrez: Move cp1085 from nginx to ats-tls - T231627
  • 05:14 marostegui: Compress tables on db1105:3311
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9663 and previous config saved to /var/cache/conftool/dbconfig/20191119-051344-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after compression', diff saved to https://phabricator.wikimedia.org/P9662 and previous config saved to /var/cache/conftool/dbconfig/20191119-051259-marostegui.json
  • 05:12 eileen: process-control config revision is 9fbfc79988 - change gap on repair job to 16 hours to reflect the with-daylight-savings ones
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T235469 ', diff saved to https://phabricator.wikimedia.org/P9661 and previous config saved to /var/cache/conftool/dbconfig/20191119-050748-marostegui.json
  • 05:02 marostegui: Start pre-switchover steps T235469
  • 04:47 vgutierrez: Move cp2023 from nginx to ats-tls - T231627
  • 04:17 vgutierrez: Move cp2019 from nginx to ats-tls - T231627
  • 03:53 vgutierrez: Move cp2016 from nginx to ats-tls - T231627
  • 03:51 tgr: T208369 ran mwscript extensions/GrowthExperiments/maintenance/deleteOldSurveys.php cswiki --cutoff 350
  • 03:37 vgutierrez: Move cp2013 from nginx to ats-tls - T231627
  • 01:12 ejegg: re-enabled fundraising CiviCRM contact de-duplication jobs
  • 01:05 ejegg: disabled fundraising CiviCRM contact de-duplication jobs
  • 00:54 ejegg: updated civicrm from 1f454aa69a to 2802bdd649
  • 00:39 mutante: phab2001 - rsyncing /srv/repos data from phab1003 (T190568)
  • 00:30 mutante: rebooting phab2001

2019-11-18

  • 23:52 catrope@deploy1001: Finished scap: Update GrowthExperiments to master in wmf.5 (includes i18n) (duration: 19m 57s)
  • 23:37 mutante: phab2001 - restart ssh-phab service after reimaging (some race condition binding to the IP before getting it on the interface after fresh install .. reschedule pybal checks (T190568)
  • 23:32 catrope@deploy1001: Started scap: Update GrowthExperiments to master in wmf.5 (includes i18n)
  • 22:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001.codfw.wmnet
  • 22:39 eileen: civicrm revision changed from c05c302e54 to 1f454aa69a, config revision is 67685c12f5
  • 22:31 mutante: phab2001 - reinstalling with buster (T190568)
  • 21:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 21:57 arlolra: Upgraded Parsoid to 2245b8f (T237886, T237103, T236864, T237569, T236930, T237463, T236867, T234266)
  • 21:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 21:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f (duration: 08m 22s)
  • 21:39 arlolra@deploy1001: Started deploy [parsoid/deploy@c6a457f]: Updating Parsoid to 2245b8f
  • 20:59 mutante: phab1003 - re-enabling puppet after merging gerrit::551271 - making sure aphlict stays disabled incl. the apache config ProxyPass lines using mod_proxy_wstunnel (T238593)
  • 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after some compression', diff saved to https://phabricator.wikimedia.org/P9659 and previous config saved to /var/cache/conftool/dbconfig/20191118-202259-marostegui.json
  • 19:03 ejegg: updated payments-wiki from 30579d34d8 to 3f99ebecc7
  • 18:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater (duration: 13m 27s)
  • 18:07 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@582d394]: New WDQS build with merging updater
  • 17:44 cdanis: rebooting grafana1002 (currently test host not used in prod)
  • 17:08 marostegui: Deploy schema change on db1116:3318
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9658 and previous config saved to /var/cache/conftool/dbconfig/20191118-165410-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after compression', diff saved to https://phabricator.wikimedia.org/P9656 and previous config saved to /var/cache/conftool/dbconfig/20191118-164923-marostegui.json
  • 16:40 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕦 sudo -E reprepro --restrict grafana update buster-wikimedia
  • 16:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on remaining wikis for T198312 (duration: 00m 53s)
  • 14:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 13m 58s)
  • 14:34 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c]: Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:34 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523 (duration: 02m 30s)
  • 14:31 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary - T229015 T238523
  • 14:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary (duration: 02m 45s)
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b288c] (dev-cluster): Parsoid: mirror traffic in split mode; add minwiktionary
  • 14:27 arturo: imported openstack ocata deb packages into stretch-wikimedia/thirdpartdy/openstack-ocata-stretch (T238338)
  • 14:22 marostegui: Deploy schema change on dbstore1005:3318
  • 13:10 ema: cp-ats: rolling ats-{tls,backend} restart to apply log_buffer_size config changes T237608
  • 12:51 Urbanecm: Run mwscript recountCategories.php --wiki=cswiki --mode={subcats,pages,files} (T228585)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=files (T238500)
  • 12:48 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=pages (T238500)
  • 12:47 Urbanecm: Run mwscript recountCategories.php --wiki=dewiki --mode=subcats (T238500)
  • 11:32 awight: EU SWAT complete
  • 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Cite: SWAT: Track pageviews only on content page views, not edits (T214493) (duration: 00m 51s)
  • 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Popups: SWAT: Don't record Popups actions on non-content pages (T214493) (duration: 00m 51s)
  • 11:04 moritzm: installing postgresql-common security updates
  • 10:56 moritzm: installing python-werkzeug security updates
  • 10:56 marostegui: Deploy schema change on db2078 (codfw master for wikidatawiki), this will create lag on s8 codfw - T237120
  • 10:53 moritzm: installing gdb updates from buster point release
  • 10:49 moritzm: installing python-cryptography bugfix updates from buster point release
  • 10:45 moritzm: updated buster netinst image for 10.2 T238519
  • 10:16 marostegui: Upgrade MySQL on labsdb1012
  • 09:33 godog: remove wezen from service, pending reimage
  • 09:11 marostegui: Remove ar_comment from triggers on db2094:3318 - T234704
  • 09:11 marostegui: Deploy schema change on s8 codfw, this will generate lag on s8 codfw - T233135 T234066
  • 09:03 marostegui: Restart MySQL on db1124 and db1125 to apply new replication filters T238370
  • 07:17 marostegui: Upgrade and restart mysql on sanitarium hosts on codfw to pick up new replication filters: db2094 and db2095 - T238370
  • 07:09 marostegui: Stop MySQL on db2070 to clone db2135 - T238183
  • 06:52 vgutierrez: Move cp1083 from nginx to ats-tls - T231627
  • 06:32 vgutierrez: Move cp1081 from nginx to ats-tls - T231627
  • 06:30 marostegui: Restart tendril mysql - T231769
  • 06:12 vgutierrez: Move cp2012 from nginx to ats-tls - T231627
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for compression', diff saved to https://phabricator.wikimedia.org/P9652 and previous config saved to /var/cache/conftool/dbconfig/20191118-060508-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for compression', diff saved to https://phabricator.wikimedia.org/P9651 and previous config saved to /var/cache/conftool/dbconfig/20191118-060207-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2072, db2088:3311, db2087:3316, db2086:3317 after maintenances and schema changes', diff saved to https://phabricator.wikimedia.org/P9650 and previous config saved to /var/cache/conftool/dbconfig/20191118-060114-marostegui.json
  • 05:53 marostegui: Deploy schema change on s5 primary master db1100 - T233135 T234066
  • 03:40 vgutierrez: Move cp2007 from nginx to ats-tls - T231627
  • 00:44 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/PageHistoryCountHandler.php: fix extremely slow query T238378 (duration: 00m 59s)

2019-11-16

  • 20:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:25 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:17 effie: restart rsyslog on mw2221
  • 09:43 elukey: systemctl restart hadoop-* on analytics1077 after oom killer

2019-11-15

  • 22:14 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:31 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:29 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 _joe_: disabling proxying to ws on phabricator1003
  • 20:04 XioNoX: push pfw policies to pfw3-eqiad - T238368
  • 20:02 XioNoX: push pfw policies to pfw3-codfw - T238368
  • 19:07 XioNoX: remove vlan 1 trunking between msw1-codfw and mr1-codfw, will cause a quick connectivity issue - T228112
  • 18:07 XioNoX: homer push on management switches
  • 17:30 mutante: phabricator - -started phd service
  • 17:11 XioNoX: homer push to management routers (https://gerrit.wikimedia.org/r/550576)
  • 16:43 hashar: Restored zuul-merger / CI for operations/puppet.git
  • 16:29 hashar: CI slowed down due to a huge spike of internal jobs. Being flushed as of now # T140297
  • 16:25 bblack: repool cp2001
  • 16:08 bblack: depool cp2001 for experiments
  • 16:02 moritzm: rebooting rpki1001 to rectify microcode loading
  • 16:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:51 ejegg: updated Fundraising CiviCRM from ae9b3819cd to c05c302e54
  • 15:36 ejegg: reduced batch size of CiviCRM contact deduplication jobs
  • 15:11 ema: pool cp3064 with ATS backend T227432
  • 15:07 ema: reboot cp3064 after reimage
  • 14:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:49 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 ema: depool cp3064 and reimage as text_ats T227432
  • 14:17 godog: SIGHUP prometheus@ops on prometheus1004
  • 14:13 bblack: lvs1013 - pybal restart for new config
  • 14:13 bblack: lvs2001 - pybal restart for new config
  • 14:13 bblack: lvs5001 - pybal restart for new config
  • 14:13 bblack: lvs4005 - pybal restart for new config
  • 14:12 bblack: lvs3005 - pybal restart for new config
  • 14:11 bblack: lvs5003 - pybal restart for new config
  • 14:11 bblack: lvs4007 - pybal restart for new config
  • 14:11 bblack: lvs3007 - pybal restart for new config
  • 14:10 bblack: lvs2004 - pybal restart for new config
  • 14:09 bblack: lvs1016 - pybal restart for new config
  • 13:28 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 03s)
  • 13:28 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 13:06 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure) (duration: 00m 04s)
  • 13:06 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure)
  • 11:43 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 09s)
  • 11:43 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 11:27 moritzm: reboott ganeti4001-4003 to rectify microcode application
  • 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315 into vslow,dump after schema change', diff saved to https://phabricator.wikimedia.org/P9645 and previous config saved to /var/cache/conftool/dbconfig/20191115-112520-marostegui.json
  • 11:19 marostegui: Reboot dbproxy2002
  • 11:15 marostegui: Reboot dbproxy2004
  • 11:12 marostegui: Reboot dbproxy2001
  • 10:45 marostegui: Run maintain-views for s5 on labsdb1011 T233135
  • 10:38 moritzm: installing ghostscript security updates
  • 10:37 mobrovac: restbase - truncated parsoidphp data tables - T229015
  • 10:36 ema: pool cp3062 with ATS backend T227432
  • 10:24 godog: roll-restart logstash to apply configuration change
  • 10:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 ema: depool cp3062 and reimage as text_ats T227432
  • 09:47 vgutierrez: Use a synthetic warning for 1% of TLSv1/TLS1v.1 pageviews - T238038
  • 09:18 vgutierrez: Move cp1079 from nginx to ats-tls - T231627
  • 09:13 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 09:02 vgutierrez: Move cp1077 from nginx to ats-tls - T231627
  • 08:42 vgutierrez: Move cp2006 from nginx to ats-tls - T231627
  • 08:30 vgutierrez: Move cp2004 from nginx to ats-tls - T231627
  • 06:41 marostegui: Stop MySQL on db2065 to clone db2134 (this will trigger an haproxy irc alert) - T238183
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change and temporary pool db1082 into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9643 and previous config saved to /var/cache/conftool/dbconfig/20191115-060807-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9642 and previous config saved to /var/cache/conftool/dbconfig/20191115-060425-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 db1082 after schema changes', diff saved to https://phabricator.wikimedia.org/P9641 and previous config saved to /var/cache/conftool/dbconfig/20191115-060300-marostegui.json
  • 05:57 marostegui: Run maintain-views for s5 on labsdb1009, labsdb1010, labsdb1012 (pending labsdb1011 as it is still running the schema change) T233135
  • 05:07 vgutierrez: Move cp3064 from nginx to ats-tls - T231627
  • 04:38 volker-e@deploy1001: Finished deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:38 volker-e@deploy1001: Started deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide:
  • 04:17 vgutierrez: Move cp3062 from nginx to ats-tls - T231627
  • 04:00 vgutierrez: Move cp3060 from nginx to ats-tls - T231627
  • 01:35 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/CompareHandler.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 53s)
  • 01:33 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/coreRoutes.json: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 52s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/parser/Parser.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 54s)

2019-11-14

  • 23:03 mutante: restarting gerrit to ncrease defaultThreadPoolSize to 2
  • 22:29 eileen: civicrm revision changed from a3714003ff to ae9b3819cd, config revision is 6adc66a20b
  • 21:32 ssastry@deploy1001: Finished deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415 (duration: 08m 21s)
  • 21:24 ssastry@deploy1001: Started deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415
  • 21:14 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:06 cdanis@cumin2001: dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json
  • 20:04 gehel: reloading data on wdqs1004 from wdqs1007 to catch up on lag faster - T238229
  • 19:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:31 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:49 catrope@deploy1001: Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)
  • 18:37 catrope@deploy1001: Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)
  • 18:35 catrope@deploy1001: Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)
  • 18:34 mutante: scandium - restart php7.2-fpm
  • 18:31 mutante: phabricator (phab1003, prod server) - upgrade PHP version to 7.2.24 (T237239)
  • 18:17 cdanis@cumin2001: dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json
  • 17:46 robh: running dell epsa tool on cp3056 per T236497
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 ejegg: updated payments-wiki from bd907656fb to 30579d34d8
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 16:09 mutante: phab2001 - upgrading PHP version to 7.2.24 (T237239)
  • 16:06 mutante: scandium - upgrading PHP version to 7.2.24 (fyi, @subbu T228069) (T237239)
  • 16:04 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase: Put a layer of APC cache on top of reading wb_terms in SqlEntityInfoBuilder (T231011 T229407 T236681), Try II (duration: 00m 56s)
  • 14:54 ema: pool cp3060 with ATS backend T227432
  • 14:53 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix bug when when looking up entity for an unknown ID (duration: 00m 53s)
  • 14:48 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group1 for T198312 (duration: 00m 53s)
  • 14:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: depool cp3060 and reimage as text_ats T227432
  • 13:37 ladsgroup@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 13:35 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 13:06 bblack: removing digicert-2019 files from cache nodes - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/550829/
  • 12:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation (duration: 14m 52s)
  • 12:09 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation
  • 11:58 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation (duration: 02m 50s)
  • 11:55 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation
  • 11:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:48 vgutierrez: Rolling restart of ats-tls/ats-backend to upgrade to 8.0.5-1wm11 - T238307
  • 10:44 vgutierrez: uploaded trafficserver-8.0.5-1wm11 to apt.wikimedia.org (stretch) - T238307
  • 10:43 ema: pool cp3058 with ATS backend T227432
  • 10:25 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:20 godog: netbox1001 bandaid/symlink /srv/deployment/netbox/deploy/src/netbox/project-static to 'static'
  • 10:06 gehel: copying journal from wdqs1007 to wdqs1005 - T238232
  • 10:05 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 10:03 Urbanecm: Run deleteEqualMessages.php --delete for cswiki and viwiki
  • 09:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 gehel: depool wdqs (public) eqiad - high lag - T238229
  • 09:34 ema: depool cp3058 and reimage as text_ats T227432
  • 09:31 marostegui: Compare wikidatawiki.pagelinks between labsdb1011 and labsdb1010 - T233986
  • 09:25 moritzm: installing ghostscript updates on thumbor1001
  • 09:24 marostegui: Stop mysql on db2067 to clone db21133 - T238183
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Full weight to db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9635 and previous config saved to /var/cache/conftool/dbconfig/20191114-092006-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 marostegui: Compare wikidatawiki.pagelinks between db1124:3318 and labsdb1010 - T233986
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 marostegui: Remove ar_comment from triggers on db1124:3315 - T234704
  • 08:41 marostegui: Deploy schema change with replication on db1082, this will generate lag on s5 labs - T233135 T234066
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P9634 and previous config saved to /var/cache/conftool/dbconfig/20191114-084043-marostegui.json
  • 08:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P9633 and previous config saved to /var/cache/conftool/dbconfig/20191114-083729-marostegui.json
  • 08:03 eileen: process-control config revision is 6adc66a20b re-enable backfill
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool a non partitioned slave db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9632 and previous config saved to /var/cache/conftool/dbconfig/20191114-080038-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 T235599', diff saved to https://phabricator.wikimedia.org/P9631 and previous config saved to /var/cache/conftool/dbconfig/20191114-075449-marostegui.json
  • 07:41 eileen: process-control config revision is b7c2cf7227 - disabled backfill again - some error?
  • 07:29 eileen: process-control config revision is 909108622d re-enable omnirecipient date repair job
  • 07:25 eileen: process-control config revision is d3ebeddcc1 (I renabled the old back fill job)
  • 07:12 moritzm: installing intel-microcode updates
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1067', diff saved to https://phabricator.wikimedia.org/P9630 and previous config saved to /var/cache/conftool/dbconfig/20191114-065309-marostegui.json
  • 06:16 marostegui: Stop replication on db1067
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1083 to s1 master and remove read-only from s1 T234800', diff saved to https://phabricator.wikimedia.org/P9629 and previous config saved to /var/cache/conftool/dbconfig/20191114-060138-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T234800', diff saved to https://phabricator.wikimedia.org/P9628 and previous config saved to /var/cache/conftool/dbconfig/20191114-060026-marostegui.json
  • 06:00 marostegui: Starting s1 failover from db1067 to db1083 - T234800
  • 05:51 jynus: stopping db1114 replication
  • 05:34 marostegui: Compress db2089:3316 - T235599
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P9627 and previous config saved to /var/cache/conftool/dbconfig/20191114-052400-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P9626 and previous config saved to /var/cache/conftool/dbconfig/20191114-052303-marostegui.json
  • 05:13 marostegui: Move replicas from db1067 to db1083 T234800
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1083 with weight 0 T234800', diff saved to https://phabricator.wikimedia.org/P9625 and previous config saved to /var/cache/conftool/dbconfig/20191114-050940-marostegui.json
  • 05:08 vgutierrez: Repooling cp1077 - T238289
  • 05:07 marostegui: Start pre-failover steps T234800
  • 05:01 kart_: Updated cxserver to 2019-11-13-111130-production tag (T237379, T235748, T236906)
  • 04:56 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:51 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:49 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 03:49 vgutierrez: power cycling cp1077 - T238289
  • 03:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 03:49 vgutierrez: depooling cp1077 - T238289
  • 00:41 ebernhardson: T237849 Start CirrusSearch forceSearchIndex.php commonswiki 2019-10-20T00:00:00 - 2019-11-14T01:00:00 pushing into jobqueue
  • 00:40 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 49s)
  • 00:39 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:39 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 44s)
  • 00:38 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:36 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php: T237849: Restore CirrusSearchBuildDocumentParse hook (duration: 00m 54s)

2019-11-13

  • 23:00 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:58 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:25 catrope@deploy1001: Finished scap: For some reason that limited i18n sync didn't work, trying a full scap (duration: 18m 33s)
  • 22:07 catrope@deploy1001: Started scap: For some reason that limited i18n sync didn't work, trying a full scap
  • 22:04 catrope@deploy1001: scap sync-l10n completed (1.35.0-wmf.5) (duration: 02m 54s)
  • 22:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Update to master (b937dce) (duration: 00m 54s)
  • 20:17 XioNoX: delete unused asw2-esams:ae1
  • 19:37 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (again) (duration: 00m 52s)
  • 18:49 Jeff_Green: authdns-update to remove host alnilam
  • 17:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (duration: 00m 53s)
  • 16:41 gehel: depool wdqs1005 - T238232
  • 16:36 gehel: restart blazegraph on wdqs1005
  • 16:21 ema: pool cp3054 with ATS backend T227432
  • 16:21 gehel: draining elastic1017-1031 to prepare for decommission - T230746
  • 16:02 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P9621 and previous config saved to /var/cache/conftool/dbconfig/20191113-155134-marostegui.json
  • 15:39 moritzm: powercycle cloudbackup2002
  • 15:35 ema: depool cp3054 and reimage as text_ats T227432
  • 15:32 moritzm: rebooting cloudbackup2002
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:29 jynus: shutdown db2072 T237905
  • 15:29 gehel: configuration of new elasticsearch servers completed, all working and pooled - T230746
  • 14:55 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9620 and previous config saved to /var/cache/conftool/dbconfig/20191113-145541-jynus.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P9619 and previous config saved to /var/cache/conftool/dbconfig/20191113-134938-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9618 and previous config saved to /var/cache/conftool/dbconfig/20191113-134625-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9617 and previous config saved to /var/cache/conftool/dbconfig/20191113-133410-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for upgrade', diff saved to https://phabricator.wikimedia.org/P9616 and previous config saved to /var/cache/conftool/dbconfig/20191113-132216-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P9615 and previous config saved to /var/cache/conftool/dbconfig/20191113-131530-marostegui.json
  • 11:56 effie: Upgrade to php 7.2.24-1 mediawiki eqiad hosts and restart php-fpm - T237239
  • 11:55 ema: cp-ats: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:46 moritzm: rebooting cloudcontrol2001-dev for microcode debugging
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 moritzm: rebooting labtestpuppetmaster2001 for microcode debugging
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:27 ema: cp-ats-ulsfo: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:27 moritzm: rebooting cloudcontrol2003-dev for some microcode debugging
  • 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:24 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9614 and previous config saved to /var/cache/conftool/dbconfig/20191113-110802-marostegui.json
  • 11:05 Urbanecm: EU SWAT done
  • 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ffwiki* (T238191)
  • 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 0a90ef9: Update localized logos for the Fula Wikipedia (T238191) (duration: 00m 54s)
  • 10:53 vgutierrez: Testing ats-tls-restart on cp5007 - T237425
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9613 and previous config saved to /var/cache/conftool/dbconfig/20191113-104326-marostegui.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9612 and previous config saved to /var/cache/conftool/dbconfig/20191113-103225-marostegui.json
  • 10:27 gehel: start configuration of new elasticsearch servers - T230746
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9610 and previous config saved to /var/cache/conftool/dbconfig/20191113-102054-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9609 and previous config saved to /var/cache/conftool/dbconfig/20191113-101127-marostegui.json
  • 09:51 jynus: upgraded wmf-mariadb101-client on cumin hosts
  • 09:50 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:43 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:41 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 09:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374 (duration: 11m 19s)
  • 09:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374
  • 09:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki (duration: 02m 35s)
  • 09:06 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki
  • 08:25 marostegui: Stop MySQL on db2062 to copy its data to db2132 T238183
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:09 marostegui: Fix replication on labsdb1010 - T233986
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P9607 and previous config saved to /var/cache/conftool/dbconfig/20191113-070339-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 for compression', diff saved to https://phabricator.wikimedia.org/P9606 and previous config saved to /var/cache/conftool/dbconfig/20191113-070055-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9605 and previous config saved to /var/cache/conftool/dbconfig/20191113-065952-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P9604 and previous config saved to /var/cache/conftool/dbconfig/20191113-065823-marostegui.json
  • 06:25 volker-e@deploy1001: Finished deploy [design/style-guide@edce4cc]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:25 volker-e@deploy1001: Started deploy [design/style-guide@edce4cc]: Deploy design/style-guide:
  • 01:35 eileen: civicrm revision changed from 3c15db25bb to a3714003ff, config revision is d678dbcaa5

2019-11-12

  • 23:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix: Do not return after inserting a single suggestion (duration: 00m 52s)
  • 23:51 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/src/mediawiki.interface.helpers.styles.less: Remove extraneous semicolons (T233649), part 2 (duration: 00m 52s)
  • 23:49 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes/changes/ChangesList.php: Remove extraneous semicolons (T233649), part 1 (duration: 00m 53s)
  • 23:49 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:45 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:22 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:20 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 bblack: repool cp1076 (experiments concluded)
  • 22:35 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: enabling REST API (duration: 00m 52s)
  • 22:34 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: enabling REST API (duration: 00m 52s)
  • 22:32 eileen: civicrm revision changed from bfa53ee611 to 3c15db25bb, config revision is d678dbcaa5
  • 21:54 bblack: depooling cp1076 for some local experimentation
  • 20:18 herron: reprepro copy buster-wikimedia stretch-wikimedia prometheus-elasticsearch-exporter
  • 20:11 otto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:11 otto@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:46 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P7007 --new-data-type external-id (T234221)
  • 19:45 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P4839 --new-data-type external-id (T234221)
  • 19:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Sync a previously undeployed change to InitialiseSettings-labs.php that someone forgot to deploy (as a no-op) in production (duration: 00m 52s)
  • 19:41 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group0 for T198312 (duration: 00m 52s)
  • 19:19 arlolra: Updated Parsoid to 6a0a708 (T215000, T235295, T235656, T235217, T235295, T236846, T237556, T235231)
  • 19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708 (duration: 10m 09s)
  • 18:58 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Final fixes and tweaks for testing (duration: 00m 53s)
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708
  • 18:39 ejegg: re-enabled Omnimail and contact de-duplication jobs
  • 18:20 Urbanecm: Morning SWAT done
  • 18:18 Urbanecm: Deploy security patch for T237887
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 130ef87: Add right "abusefilter-log-private" to usergroup "rollbacker" at ptwiki (T237830) (duration: 00m 53s)
  • 18:08 XioNoX: push pfw change to add recdns anycast IP
  • 17:33 XioNoX: update fasw-c-eqiad to match current standard (ntp/users/rootpw/lldp)
  • 17:22 XioNoX: update fasw-c-codfw to match current standard (ntp/users/rootpw/lldp)
  • 17:03 ema: pool cp3052 with ATS backend T238085
  • 17:03 ema: pool cp3052 with ATS backend T227432
  • 16:53 bblack: cpNNNN (all cache nodes) - cumin manual removal of globalsign-2018 remnants (key, cert, ocsp config, ocsp output)
  • 16:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 XioNoX: setup bgp session from cr2-codfw to multihop RIS collector - T106056
  • 16:21 XioNoX: reboot scs-c1-eqiad.mgmt.eqiad.wmnet - T238036
  • 16:09 ema: depool cp3052 and observe performance impact T238085 before reimaging as text_ats T227432
  • 15:49 marostegui: Deploy schema change on db1102:3315 T233135 T234066
  • 15:45 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fixes and tweaks for initial rollout (duration: 00m 53s)
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for a schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9600 and previous config saved to /var/cache/conftool/dbconfig/20191112-154127-marostegui.json
  • 15:24 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=schema
  • 14:46 bblack: cpNNNN (all caches): remove stale outputs from transient ocsp failures ( /var/cache/ocsp/update-ocsp-*.tmp )
  • 14:41 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 14:38 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4021.ulsfo.wmnet,service=nginx
  • 14:35 ema: cp4021: ats-tls-restart to see if https://gerrit.wikimedia.org/r/550475 fixed the script
  • 14:16 Jeff_Green: authdns-update to deploy fundraising-read.wmnet service cname adjustment
  • 14:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set all of wikidata for write both for term store" (duration: 00m 52s)
  • 12:57 godog: refresh kibana field list
  • 12:46 gehel: repool wdqs1004
  • 12:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 100 (T237984)
  • 12:19 onimisionipe: restarting blazegraph on wdqs1005
  • 12:11 effie: Reimage mwdebug1002 - T214734
  • 11:47 Amir1: EU SWAT is done
  • 11:47 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase term store error reduction, Do not catch DBError in ReplicaMasterAwareRecordIdsAcquirer. (T236466) (duration: 00m 56s)
  • 11:44 effie: Upgrade wtp* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata for write both for term store (T225055) (duration: 00m 52s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SECURITY: Dont allow Wikimedia sysops to see who had 2FA disabled (duration: 00m 53s)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9599 and previous config saved to /var/cache/conftool/dbconfig/20191112-104400-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9598 and previous config saved to /var/cache/conftool/dbconfig/20191112-103641-marostegui.json
  • 10:35 onimisionipe: resetting cronfile on wdqs hosts
  • 10:33 marostegui: Drop labtestwiki database from m5 master db1133 - T236010
  • 10:30 marostegui: Deploy schema change on dbstore1003:3315
  • 10:07 ema: repool cp3065, nothing interesting in kern.log and SEL T238032
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9596 and previous config saved to /var/cache/conftool/dbconfig/20191112-095221-marostegui.json
  • 09:42 marostegui: Remove privileges for labtestwiki on m5 - T236010
  • 09:27 gehel: restarting blazegraph on wdqs1004
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083', diff saved to https://phabricator.wikimedia.org/P9595 and previous config saved to /var/cache/conftool/dbconfig/20191112-091706-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for mariadb upgrade to 10.1.39 - T234800', diff saved to https://phabricator.wikimedia.org/P9594 and previous config saved to /var/cache/conftool/dbconfig/20191112-091158-marostegui.json
  • 09:11 marostegui: Upgrade mariadb to 10.1.39 on db1083 (candidate master for s1)
  • 08:56 moritzm: restarting archiva to pick up Java security updates
  • 08:44 volker-e@deploy1001: Finished deploy [design/style-guide@3de6820]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:44 volker-e@deploy1001: Started deploy [design/style-guide@3de6820]: Deploy design/style-guide:
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9593 and previous config saved to /var/cache/conftool/dbconfig/20191112-083720-marostegui.json
  • 08:37 gehel: depool wdqs1004 to investigate update lag
  • 08:35 moritzm: installing poppler security updates
  • 08:24 volker-e@deploy1001: Finished deploy [design/style-guide@b926b95]: Deploy design/style-guide: (duration: 00m 07s)
  • 08:24 volker-e@deploy1001: Started deploy [design/style-guide@b926b95]: Deploy design/style-guide:
  • 08:15 moritzm: installing curl security updates
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9592 and previous config saved to /var/cache/conftool/dbconfig/20191112-081322-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9591 and previous config saved to /var/cache/conftool/dbconfig/20191112-074006-marostegui.json
  • 07:36 elukey: remove /etc/logrotate.d/wdqs_autodeployment_log from wdqs1009 (not in puppet anymore and causing cronspam)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9590 and previous config saved to /var/cache/conftool/dbconfig/20191112-072823-marostegui.json
  • 07:10 marostegui: Upgrade kernel on db1083 (s1 candidate master)
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade - T234800', diff saved to https://phabricator.wikimedia.org/P9589 and previous config saved to /var/cache/conftool/dbconfig/20191112-070436-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:44 marostegui: Change triggers on s5 db2094 - T234704
  • 06:40 marostegui: Deploy schema change on s5 codfw with replication, this will generate lag on s5 codfw T233135 T234066
  • 06:21 marostegui: Compress db2087:3316, db2087:3317 T235599
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for compression - T235599', diff saved to https://phabricator.wikimedia.org/P9588 and previous config saved to /var/cache/conftool/dbconfig/20191112-061959-marostegui.json
  • 03:41 vgutierrez: restart wdqs-blazegraph on wdqs1004

2019-11-11

  • 22:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 22:49 ema: power-cycle cp3065, currently down
  • 19:36 XioNoX: disable ALGs on mr1-esams
  • 18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 00m 57s)
  • 18:19 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 15m 14s)
  • 18:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 17:44 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:44 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:42 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 14:26 ema: pool cp3050 with ATS backend T227432
  • 13:50 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:25 ema: depool cp3050 and reimage as text_ats T227432
  • 12:59 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 12:46 effie: Upgrade to 7.2.24-1 mwdebug[2001-2002].codfw.wmnet,mwmaint2001.codfw.wmnet,deploy2001.codfw.wmnet - T237239
  • 12:31 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010 (duration: 00m 28s)
  • 12:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010
  • 12:28 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 12:21 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T231881
  • 11:55 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:52 hoo: Updated the Wikidata property suggester with data from the 2019-11-04 JSON dump and applied the T132839 workarounds
  • 10:48 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:47 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:32 vgutierrez: restarting ats-tls on cp1088
  • 10:21 jynus: upgrade mariadb on db2102
  • 10:16 ema: repool cp4027 after successful X-Wikimedia-Debug testing P9585 T237687
  • 10:12 jynus: manually run full backup of labtestpuppetmaster2001 T235819
  • 09:41 ema: test x-wikimedia-debug-routing.lua on cp4027 (depooled) T237687
  • 09:09 volker-e@deploy1001: Finished deploy [design/style-guide@0ea65f2]: Deploy design/style-guide: (duration: 00m 07s)
  • 09:09 volker-e@deploy1001: Started deploy [design/style-guide@0ea65f2]: Deploy design/style-guide:
  • 08:28 marostegui: Stop MySQL on db2048 before decommissioning - T237913
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2048 from config T237913 (duration: 00m 51s)
  • 08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2048 from config T237913 (duration: 00m 54s)
  • 08:21 marostegui: Remove db2048 from tendril and zarcillo T237913
  • 06:56 elukey: delete /etc/logrotate.d/wdqs-reload-categories from wdqs* as attempt to reduce cronspam
  • 06:44 marostegui: Delete globalblocks table from napwikisource T230055
  • 05:27 vgutierrez: Switch from nginx to ats-tls on cp3058 - T231627

2019-11-09

  • 20:25 reedy@deploy1001: Synchronized langlist-labs: T237823 (duration: 00m 54s)
  • 02:39 volker-e@deploy1001: Finished deploy [design/style-guide@d2bfc09]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:39 volker-e@deploy1001: Started deploy [design/style-guide@d2bfc09]: Deploy design/style-guide:
  • 01:07 volker-e@deploy1001: Finished deploy [design/style-guide@ef82b69]: Deploy design/style-guide: (duration: 00m 07s)
  • 01:07 volker-e@deploy1001: Started deploy [design/style-guide@ef82b69]: Deploy design/style-guide:
  • 01:06 volker-e@deploy1001: Finished deploy [design/style-guide@97fb3ee]: Deploy design/style-guide: (duration: 00m 09s)
  • 01:06 volker-e@deploy1001: Started deploy [design/style-guide@97fb3ee]: Deploy design/style-guide:

2019-11-08

  • 20:26 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation request jobs by 5 mins for testing (duration: 00m 52s)
  • 16:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "MachineVision: Enable testers-only mode on testcommonswiki for debugging" (duration: 00m 54s)
  • 15:57 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118, db1106 at 100%', diff saved to https://phabricator.wikimedia.org/P9582 and previous config saved to /var/cache/conftool/dbconfig/20191108-155700-jynus.json
  • 15:37 herron: beginning rolling service restarts on logstash hosts for java security updates
  • 15:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enable testers-only mode on testcommonswiki for debugging (duration: 00m 52s)
  • 14:56 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:55 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9581 and previous config saved to /var/cache/conftool/dbconfig/20191108-145028-jynus.json
  • 14:42 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: stop and upgrade percona-server on test host db1114
  • 13:27 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:12 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9580 and previous config saved to /var/cache/conftool/dbconfig/20191108-131257-jynus.json
  • 13:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee2027c: Change the language of Votewiki back to English (en) (T230614) (duration: 00m 54s)
  • 12:34 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:14 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 10%', diff saved to https://phabricator.wikimedia.org/P9578 and previous config saved to /var/cache/conftool/dbconfig/20191108-121444-jynus.json
  • 12:02 jynus: update and restart db1118
  • 12:01 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1118 fully', diff saved to https://phabricator.wikimedia.org/P9577 and previous config saved to /var/cache/conftool/dbconfig/20191108-120138-jynus.json
  • 11:55 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9576 and previous config saved to /var/cache/conftool/dbconfig/20191108-115553-jynus.json
  • 11:27 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9575 and previous config saved to /var/cache/conftool/dbconfig/20191108-112733-jynus.json
  • 11:25 jynus@cumin1001: dbctl commit (dc=all): 'repool db2130', diff saved to https://phabricator.wikimedia.org/P9574 and previous config saved to /var/cache/conftool/dbconfig/20191108-112503-jynus.json
  • 11:12 jynus: update and restart db2130
  • 11:11 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2116, depool db2130', diff saved to https://phabricator.wikimedia.org/P9573 and previous config saved to /var/cache/conftool/dbconfig/20191108-111125-jynus.json
  • 10:58 Amir1: running rebuildItemTerms on 8028 items (T234329)
  • 10:51 jynus: update and restart db2116
  • 10:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2103, depool db2116', diff saved to https://phabricator.wikimedia.org/P9572 and previous config saved to /var/cache/conftool/dbconfig/20191108-105013-jynus.json
  • 10:38 jynus: update and restart db2103
  • 10:34 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephmon[1-3] T228102
  • 10:33 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephosd[1-3] T224188
  • 10:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2092, depool db2103', diff saved to https://phabricator.wikimedia.org/P9571 and previous config saved to /var/cache/conftool/dbconfig/20191108-103218-jynus.json
  • 10:19 jynus: update and restart db2092
  • 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2071, depool db2092', diff saved to https://phabricator.wikimedia.org/P9570 and previous config saved to /var/cache/conftool/dbconfig/20191108-101759-jynus.json
  • 10:09 elukey: restart jvm-based hadoop daemons on an-master100[1,2] to pick up the new openjdk version
  • 10:06 jynus: update and restart db2071
  • 10:03 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P9569 and previous config saved to /var/cache/conftool/dbconfig/20191108-100310-jynus.json
  • 10:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2072', diff saved to https://phabricator.wikimedia.org/P9568 and previous config saved to /var/cache/conftool/dbconfig/20191108-100128-jynus.json
  • 09:50 moritzm: uploaded openjdk 8u232-b09-1~deb10u1 to component/jdk8 for apt.wikimedia.org/buster-wikimedia
  • 09:41 jynus: update and restart db2072
  • 09:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9567 and previous config saved to /var/cache/conftool/dbconfig/20191108-094100-jynus.json
  • 09:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9566 and previous config saved to /var/cache/conftool/dbconfig/20191108-093958-jynus.json
  • 09:35 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 09:29 jynus: update and restart db2094
  • 09:27 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9565 and previous config saved to /var/cache/conftool/dbconfig/20191108-092735-jynus.json
  • 09:10 jynus: update and restart db1106
  • 09:08 moritzm: installing Java security updates on kafka-jumbo
  • 09:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 fully', diff saved to https://phabricator.wikimedia.org/P9564 and previous config saved to /var/cache/conftool/dbconfig/20191108-090746-jynus.json
  • 09:05 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 09:04 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9563 and previous config saved to /var/cache/conftool/dbconfig/20191108-090451-jynus.json
  • 09:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9562 and previous config saved to /var/cache/conftool/dbconfig/20191108-090012-jynus.json
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:52 jynus: stop and upgrade db1124 (may create temporary lag on wikireplicas)
  • 08:31 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:23 elukey: restart kafka on kafka-jumbo1001 to test the new openjdk
  • 08:07 moritzm: installing fribidi security updates on Buster
  • 03:03 vgutierrez: Switch from nginx to ats-tls on cp3054 - T231627
  • 02:42 vgutierrez: Switch from nginx to ats-tls on cp3052 - T231627
  • 01:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GlobalBlocking/: Prevent some extra db queries (duration: 00m 53s)
  • 01:14 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Use internationalized semicolon separators (T233649) (duration: 00m 53s)
  • 01:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic (duration: 03m 04s)
  • 01:06 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic
  • 00:44 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logging.js: Fix homepage instrumentation (T237600) (duration: 00m 52s)
  • 00:40 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes: Sync DiffEngine changes that were needed to unbreak CI (duration: 00m 55s)
  • 00:34 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Semicolon should appear after log entries (T237500) (duration: 00m 53s)
  • 00:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix remote API configs for GrowthExperiments (duration: 00m 51s)
  • 00:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable suggested edits as hidden preference on arwiki, cswiki, kowiki, viwiki (T236968) (duration: 00m 53s)

2019-11-07

  • 23:49 foks: removing one file for legal compliance
  • 23:47 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert phatalaty again (duration: 03m 04s)
  • 23:44 shdubsh: start elasticsearch on logstash1008
  • 23:44 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert phatalaty again
  • 23:41 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: one more time (duration: 03m 00s)
  • 23:38 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: one more time
  • 23:31 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout (duration: 03m 02s)
  • 23:28 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout
  • 23:23 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert to previous phatality plugin version (duration: 02m 55s)
  • 23:20 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert to previous phatality plugin version
  • 23:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 00m 06s)
  • 23:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 23:04 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 06m 48s)
  • 23:00 XenoRyet: updated payments-wiki from aac3d93f70 to bd907656fb
  • 22:57 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 22:53 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes (duration: 00m 05s)
  • 22:53 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes
  • 22:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Remove annotation job delay (duration: 00m 53s)
  • 22:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions (duration: 00m 06s)
  • 22:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions
  • 21:54 andrewbogott: rebuilding labtestpuppetmaster2001 w/Stretch
  • 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
  • 21:28 mutante: boron apt-get clean (saved 9G on /) (T237649)
  • 20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.5 refs T233853
  • 20:24 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ArticleTarget.js: Fix error handling (duration: 01m 00s)
  • 20:21 herron: performing rolling reboots of kafka-main hosts for security updates
  • 20:17 onimisionipe: cluster restart for cloudelastic to pick JVM upgrade
  • 20:08 eileen: civicrm revision changed from f1ce5c86f7 to bfa53ee611, config revision is 72d2692743
  • 19:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enqueue annotation job on upload complete (duration: 05m 19s)
  • 18:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Disable retrying annotation requests (duration: 05m 17s)
  • 18:25 ebernhardson: restart mjolnir-kafka-bulk-daemon and mjolnir-kafka-msearch-daemon across `cirrus` dsh group
  • 18:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration (duration: 05m 49s)
  • 18:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration
  • 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:38 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:30 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:25 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Drop currently unsupported external dependencies (T227349) (duration: 05m 19s)
  • 17:10 XioNoX: Homer push - forwarding-options - to all cr
  • 17:09 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:08 XioNoX: add sampling stanza (disabled) to cr2-esams
  • 17:00 mutante: wtp2020 - 2 hours downtime - shut down (T205712) - go ahead @papaul
  • 17:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 16:58 mutante: wtp2020 - depooled for T205712
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp2020.codfw.wmnet
  • 16:42 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: some alphasorted config (duration: 01m 00s)
  • 16:34 XioNoX: Homer push on cr2-knams: Sampling (disabled), enhanced-hash-key, ospf interfaces re-ordering (noop), policy-statement BGP_from_LVS (unused), lo0 term allow_vmhost
  • 16:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 100%', diff saved to https://phabricator.wikimedia.org/P9553 and previous config saved to /var/cache/conftool/dbconfig/20191107-163235-jynus.json
  • 16:20 XioNoX: add BGP sessions to AS64050 in eqiad
  • 16:15 XioNoX: add BGP sessions to AS57695 in esams and eqiad
  • 16:12 XioNoX: clear v4 BGP sessions to AS7713 in eqsin (hit max prefix limit)
  • 16:02 mutante: mw2225 restart cron (T236799)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta logging (duration: 01m 00s)
  • 15:41 XioNoX: remove BGP to AS3491 on eqiad (left the IX)
  • 15:40 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:53 jbond42: rebuilding compiler1001
  • 13:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 50%', diff saved to https://phabricator.wikimedia.org/P9551 and previous config saved to /var/cache/conftool/dbconfig/20191107-135018-jynus.json
  • 12:47 Urbanecm: EU SWAT done
  • 12:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 8e71601: a36ed85: GrowthExperiments: Configure testwiki for suggested edits testing + follow up patch (T237634) (duration: 00m 59s)
  • 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 19034af: GrowthExperiments: Configure intro links for suggested edits (T235723) (duration: 01m 00s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2be3f86: [cirrus] remove cross_cluster_single_shard_search quirk (duration: 01m 02s)
  • 12:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5253dec: Give commonswiki filemovers `suppressredirect` rights (T236348) (duration: 01m 03s)
  • 11:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 fully (duration: 01m 01s)
  • 11:54 jbond42: update puppet_version used by CI 545289
  • 11:50 jbond42: rebuilding compiler1002
  • 11:36 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 10%', diff saved to https://phabricator.wikimedia.org/P9550 and previous config saved to /var/cache/conftool/dbconfig/20191107-113611-jynus.json
  • 11:16 jynus: stop and upgrade db1080
  • 10:58 moritzm: installing Java security updates on kafka-main/logstash
  • 10:50 moritzm: installing Java security updates on wdqs/maps
  • 10:46 jynus@cumin1001: dbctl commit (dc=all): 'Fully depool db1080', diff saved to https://phabricator.wikimedia.org/P9549 and previous config saved to /var/cache/conftool/dbconfig/20191107-104618-jynus.json
  • 10:28 moritzm: upgrading mw1277-1279 servers to PHP 7.2.24 T237239
  • 10:27 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1080 weight', diff saved to https://phabricator.wikimedia.org/P9548 and previous config saved to /var/cache/conftool/dbconfig/20191107-102747-jynus.json
  • 09:41 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 with low weight (duration: 01m 02s)
  • 09:30 jynus: stop and upgrade es1016
  • 09:18 moritzm: installing Java security updates on aqs/druid/Hadoop
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1016 (duration: 01m 04s)
  • 09:03 jynus: stop and upgrade es2012, es2014
  • 08:48 jynus: stop and upgrade es2011
  • 08:30 jynus: upgrade and restart db2093
  • 00:21 XioNoX: enable interface damping on primary eqsin-codfw link - T236878
  • 00:09 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/549227 (duration: 01m 00s)
  • 00:00 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 04m 29s)

2019-11-06

  • 23:56 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 23:55 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 14m 56s)
  • 23:40 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 22:36 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on commonswiki (T227349)
  • 22:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 22:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on commonswiki (T227349) (duration: 01m 00s)
  • 22:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation jobs on commonswiki only (duration: 01m 01s)
  • 22:17 mdholloway: created MachineVision extension tables on commonswiki
  • 22:13 XioNoX: push standard forwarding-options to cr3/4-ulsfo
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 22:04 mholloway-shell@deploy1001: Synchronized private/PrivateSettings.php: Configure Google Cloud Vision API credentials (2/2) (T236426) (duration: 00m 59s)
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1247.eqiad.wmnet
  • 22:03 mholloway-shell@deploy1001: Synchronized private/GoogleCloudVision.php: Configure Google Cloud Vision API credentials (1/2) (T236426) (duration: 00m 59s)
  • 21:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Allow specifying API credentials as an associative array (T236426) (duration: 01m 01s)
  • 21:53 thcipriani: checkout /srv/mediawiki-staging/php-1.35.0-wmf.5/maintenance/Maintenance.php looks like a local change for debugging left behind
  • 21:47 arlolra: Updated Parsoid to 1d283ed (T237104, T227209, T236865)
  • 21:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed (duration: 10m 22s)
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1247.eqiad.wmnet
  • 21:14 XioNoX: push standard forwarding-options to cr3-esams
  • 21:12 milimetric@deploy1001: Finished deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns (duration: 10m 52s)
  • 21:01 milimetric@deploy1001: Started deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns
  • 20:36 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/OpenStackManager/: sync openstackmanager to deploy https://gerrit.wikimedia.org/r/#/q/I5b08f0069941052acdd9f05a62aac5b2cf9ecdd5 (duration: 01m 00s)
  • 20:34 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.5 refs T233853 (duration: 01m 00s)
  • 20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.5 refs T233853
  • 19:05 mutante: mw1225 - re-enabling puppet (no reason given, nothing in SAL or Phab but disabled)
  • 18:43 mutante: LDAP - add dwisehaupt to wmf group (T235676)
  • 18:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix typo (T222117) (duration: 01m 00s)
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Instrument logging to ClosedWikiProvider (T222117) (duration: 01m 01s)
  • 17:22 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1126 weight, too much backlog', diff saved to https://phabricator.wikimedia.org/P9542 and previous config saved to /var/cache/conftool/dbconfig/20191106-172235-jynus.json
  • 17:21 ejegg: turned off donation queue consumer for financial_trxn record fix
  • 17:17 ejegg: updated Fundraising CiviCRM from 1c3be265ae to f1ce5c86f7
  • 17:15 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 fully (duration: 00m 59s)
  • 17:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WebAuthn extension if wmgUseWebAuthn is set (false in all of production) T227242 (duration: 01m 00s)
  • 17:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseWebAuthn false in all of production T227242 (duration: 01m 01s)
  • 17:08 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 fully', diff saved to https://phabricator.wikimedia.org/P9541 and previous config saved to /var/cache/conftool/dbconfig/20191106-170852-jynus.json
  • 16:11 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on testcommonswiki (T227349)
  • 15:58 mdholloway: created MachineVision tables on testcommonswiki (T227349)
  • 15:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure MachineVision and enable on testcommonswiki (T227349) (duration: 01m 00s)
  • 15:47 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: MachineVision: Use an HTTP proxy in production (T236843) (duration: 01m 01s)
  • 15:42 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Do not restrict to testing users on Beta (duration: 01m 00s)
  • 15:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Fix Beta config with updated service name (duration: 01m 02s)
  • 14:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 with low weight (duration: 00m 59s)
  • 14:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable streaks and revert counts (T234955, T234956) (duration: 01m 00s)
  • 14:27 jynus: upgrade and restart es1019
  • 14:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 01m 00s)
  • 14:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 50%', diff saved to https://phabricator.wikimedia.org/P9539 and previous config saved to /var/cache/conftool/dbconfig/20191106-140702-jynus.json
  • 12:38 Urbanecm: EU SWAT done
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 2/2) (duration: 01m 00s)
  • 12:36 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 1/2) (duration: 00m 59s)
  • 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3e9ede0: Add 104 (Cookbook) to $wgContentNamespaces for bnwikibooks (T236840) (duration: 01m 00s)
  • 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5875c45: [cirrus] Disable instant indexing on wikidata (duration: 01m 15s)
  • 11:57 jynus: upgrade and restart db2048
  • 11:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 10%', diff saved to https://phabricator.wikimedia.org/P9537 and previous config saved to /var/cache/conftool/dbconfig/20191106-113510-jynus.json
  • 11:14 jynus: stopping db1074 for maintenance (will create temporary s2 lag on wikireplicas)
  • 11:06 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P9536 and previous config saved to /var/cache/conftool/dbconfig/20191106-110603-jynus.json
  • 09:46 moritzm: upgrading mw1262-mw1265,mw1276 servers to PHP 7.2.24 T237239
  • 09:33 jynus: stop and upgrade labsdb1011 T236015
  • 09:25 jynus: depooling labsdb1011 for wikireplica service T236015
  • 09:10 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 08:58 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 08:51 jynus: upgrading wmf-mariadb101-client on cumin hosts
  • 08:51 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.24 T237239
  • 08:33 jynus: upgrading db2102 mariadb (test-s1)
  • 07:48 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 07:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 02:59 vgutierrez: Switch from nginx to ats-tls on cp5012 - T231627
  • 00:07 mdholloway: created table wikimedia_editor_tasks_edit_streak on x1/wikishared (T234956)

2019-11-05

  • 23:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.5 refs T233853
  • 23:25 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.5 refs T233853 (duration: 24m 13s)
  • 23:01 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:51 twentyafterfour@deploy1001: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2905573311"/* "/srv/mediawiki-staging/php-1.35.0-wmf.5/cache/l10n"' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:50 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:39 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2076118383" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:38 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:17 twentyafterfour: scap failed with error: A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. refs T233853
  • 22:09 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_840646293" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 04m 54s)
  • 22:04 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:03 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr2-esams:lo0.0
  • 21:58 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr3-esams:lo0.0
  • 20:45 mutante: shutting down cobalt (formerly gerrit server)
  • 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:33 XioNoX: push fw policies to pfw3-eqiad - T236201
  • 20:23 XioNoX: push fw policies to pfw3-codfw - T236201
  • 20:17 joal@deploy1001: Finished deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch (duration: 08m 21s)
  • 20:09 joal@deploy1001: Started deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade (duration: 08m 49s)
  • 20:00 joal@deploy1001: Started deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade
  • 18:40 twentyafterfour: MediaWiki train: start branching wmf/1.35.0-wmf.5
  • 18:30 XioNoX: fix typo on cr1-eqsin:lo0.0 v6 IP
  • 18:27 ejegg: updated payments-wiki from 0de9d96208 to aac3d93f70
  • 17:21 jynus: restarting etherpad
  • 16:56 arturo: deleted stretch-wikimedia/thirdparty/kubeadm-k8s and created buster-wikimedia/thirdparty/kubeadm-k8s
  • 16:24 papaul: Replacing disk on db2120
  • 15:37 jynus: deploying schema change on x1 T234955
  • 15:20 ema: cp4027: upgrade trafficserver to 8.0.5-1wm10
  • 14:37 jynus: reducing consistency temporarilly on db1114 so it can catch up replication
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 ema: pool cp5012 with ATS backend T227432
  • 10:45 vgutierrez: restarting atsmtail@backend on cp5006
  • 09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 ema: wb2-phab stopped saying things a while ago. Restarted
  • 09:18 jynus: restart dbprov100[12] T236924
  • 09:11 jynus: restart dbprov2001 T236924
  • 08:12 vgutierrez: uploaded fifo-log-demux 0.6 to apt.wikimedia.org (stretch)
  • 08:02 jynus: redact mnwwiki on db1124 and db2094 T235743
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp5011 - T231627
  • 04:13 vgutierrez: Switch from nginx to ats-tls on cp5010 - T231627
  • 03:51 vgutierrez: pooling cp3057 - T237348
  • 03:46 mutante: wdqs1004 restarting wdqs-blazegraph
  • 03:01 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 02:59 vgutierrez: depool cp3057 - T237348
  • 00:15 mutante: gerrit - restarting service to re-enable jgit gc (T217497)
  • 00:13 mutante: gerrit2001 - restart gerrit (replica)

2019-11-04

  • 23:18 milimetric@deploy1001: Finished deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs (duration: 07m 20s)
  • 23:11 milimetric@deploy1001: Started deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs
  • 23:05 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:03 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:08 bd808: The Wikimedia SAL Twitter feed is now @wikimedia_sal (https://twitter.com/wikimedia_sal) T237322
  • 20:51 bd808: Testing twitter feed following account confirmation
  • 19:23 Urbanecm: Morning SWAT done
  • 19:17 mutante: cobalt - stopping services, removing apache2
  • 19:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6a4b966: Add throttle rule for bard college editathon (T236955) (duration: 00m 54s)
  • 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9204768: Enable DNS blacklist for es.wikinews (T237151) (duration: 00m 53s)
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0fc3909: Allow FlaggedRevs autoreview permission to be assigned globally (duration: 00m 54s)
  • 18:30 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode (duration: 03m 27s)
  • 18:26 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode
  • 18:24 ppchelko@deploy1001: Finished deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902 (duration: 14m 30s)
  • 18:17 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts (duration: 12m 07s)
  • 18:09 ppchelko@deploy1001: Started deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902
  • 18:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts
  • 17:41 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Update for YAML-reading (offline) (duration: 00m 52s)
  • 17:39 jforrester@deploy1001: Synchronized wmf-config/config/: Sync out YAML config files (duration: 00m 56s)
  • 15:43 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable revert counts on beta (T234955) (duration: 00m 53s)
  • 15:36 jynus: running failing check_private_data report on labsdb1009 T235743
  • 15:33 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 00m 59s)
  • 15:01 joal@deploy1001: Started restart [analytics/aqs/deploy@59a97fa]: (no justification provided)
  • 14:36 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:36 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 ema: upload trafficserver 8.0.5-1wm10 to stretch-wikimedia
  • 13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:47 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 elukey: update bacula terms on analytics-in{4,6} filters on cr{1,2}-eqiad - T237016
  • 13:28 jbond42: update production puppetmasters to use new puppetdb servers
  • 13:20 Amir1: Creating Mon Wikipedia is done T235739
  • 13:19 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 13:16 ladsgroup@deploy1001: Synchronized langlist: T235739 (duration: 00m 52s)
  • 13:15 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T235739 (duration: 00m 53s)
  • 13:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T235739 (duration: 00m 53s)
  • 13:13 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T235739 (duration: 00m 52s)
  • 13:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T235739
  • 13:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 53s)
  • 13:06 ema: depool cp5012 and reimage as text_ats T227432
  • 12:21 Urbanecm: EU SWAT done
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 2/2) (duration: 00m 52s)
  • 12:12 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki* (T236905)
  • 12:11 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 1/2) (duration: 00m 53s)
  • 12:08 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: a6d64b1: Update logo for zh-classical Wikipedia (T236905) (duration: 00m 53s)
  • 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c92a13c: Enable partial blocks on kowiki (T236752) (duration: 00m 54s)
  • 12:00 moritzm: upgrading mw1261 to PHP 7.2.24 (T237239)
  • 11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 11:08 moritzm: uploaded PHP 7.2.24 to apt.wikimedia.org stretch-wikimedia/component/php72 (T237239)
  • 04:53 vgutierrez: Switch from nginx to ats-tls on cp5009 - T231627
  • 04:39 vgutierrez: Switch from nginx to ats-tls on cp5008 - T231627

2019-11-03

  • 03:54 andrew@deploy1001: Finished deploy [horizon/deploy@0c024d4]: one more prefix fix (duration: 03m 35s)
  • 03:50 andrew@deploy1001: Started deploy [horizon/deploy@0c024d4]: one more prefix fix
  • 03:10 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try) (duration: 00m 25s)
  • 03:10 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try)
  • 03:09 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (duration: 06m 01s)
  • 03:03 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation

2019-11-02

  • 00:58 mutante: gerrit-replica - created missing /var/lib/gerrit2/review_site/tmp and restarted service - service back up on buster (T176774)
  • 00:34 mutante: gerrit-replica - fixing permissions of files in /srv/gerrit and restarting
  • 00:27 mutante: gerrit2001 - copy mysql-connector-java.jar into /usr/share/java/ and link it into /var/lib/gerrit2/review_site/lib (T176774)
  • 00:05 mutante: rsyncing gerrit plugin dir from gerrit1001 to gerrit2001 (T176774)

2019-11-01

  • 23:45 mutante: rsyncing gerrit git data from gerrit1001 to gerrit2001 (using --delete too!) T176774
  • 22:00 mutante: gerrit - repo sync between gerrit and gerrit-replica in progress .. if you can't clone from replica you can use main gerrit and replica will come back
  • 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: T237126 Fixing DOM in upload interface of UploadWizard (duration: 00m 56s)
  • 21:06 mutante: scp /usr/share/java/mysql-connector-java.jar from gerrit1001 to gerrit2001 (T176774)
  • 20:46 cdanis: add to bot_blocked_nets the IPs of several EC2 instances sending expensive requests to ORES T237134
  • 19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mutante: gerrit2001 - reinstalling with buster
  • 19:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration (duration: 00m 11s)
  • 19:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration
  • 16:39 XioNoX: push Add BGP_from_LVS policy and term vmhost to loopback4 filter to CRs
  • 16:37 ema: pool cp5011 with ATS backend T227432
  • 16:16 XioNoX: asw2-a-eqiad# run request system license add terminal
  • 15:39 moritzm: installing libonig security updates
  • 15:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 moritzm: installing libpcap security updates
  • 15:11 moritzm: installing python-ecdsa security updates
  • 14:34 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 ema: depool cp5011 and reimage as text_ats T227432
  • 14:02 moritzm: rebooting kafka-main1004 for microcode tests
  • 14:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:56 moritzm: upgrading mwdebug2002 to PHP 7.2.24 for some smoke tests with the new build
  • 12:18 ema: pool cp5010 with ATS backend T227432
  • 11:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 ema: depool cp5010 and reimage as text_ats T227432
  • 11:08 effie: enable puppet mediawiki and prometheus servers
  • 10:54 effie: remove prometheus-hhvm-exporter package from mw* servers - T229792
  • 10:37 moritzm: installing clamav security updates on mendelevium
  • 10:33 effie: Disable puppet on mediawiki and prometheus servers to remove hhvm exporters - T229792
  • 09:28 moritzm: installing file security updates on jessie
  • 09:21 effie: depool mw1317
  • 09:19 moritzm: installing golang-1.11 security updates
  • 08:57 moritzm: installing ruby-loofah security updates
  • 08:17 moritzm: installing libarchive security updates
  • 01:58 volker-e@deploy1001: Finished deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements (duration: 00m 05s)
  • 01:58 volker-e@deploy1001: Started deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements
  • 01:21 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/resources/src/mediawiki.widgets/mw.widgets.UsersMultiselectWidget.js: T236460 mw.widgets.UsersMultiselectWidget: Fix property name (duration: 00m 54s)

2019-10-31

  • 23:33 Urbanecm: Evening SWAT done
  • 23:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice/extension.json: SWAT: dcd3ec3: Fix error in CentralNoticeImpression schema (T236627) (duration: 00m 51s)
  • 23:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/VisualEditor/: SWAT: 3686b82: Revert "Parse relative hrefs on image nodes like on regular links" (T237040) (duration: 00m 53s)
  • 23:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 02bf4b8: Re-enable mobile editor A/B testing (T236337) (duration: 00m 52s)
  • 23:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki* (T237035)
  • 23:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 54ee973: Change bawiki logo to an anniversary one (T237035) (duration: 00m 53s)
  • 23:04 eileen: civicrm revision changed from d2045c6b98 to 1183915bde, config revision is 1a709a61aa
  • 23:00 mutante: replacing deployment keys for apache2secmod ; re-arming keyholder on deployment server
  • 22:51 XioNoX: Homer push to cr1/2-eqiad
  • 22:17 XioNoX: Homer push to cr1/2-codfw
  • 22:14 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 00m 06s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:12 mutante: vega sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:12 twentyafterfour@deploy1001: deploy aborted: testing deploy_design (duration: 05m 07s)
  • 22:12 mutante: bromine sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:05 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 01m 30s)
  • 22:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 21:59 mutante: deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677)
  • 21:49 mutante: deploy1001 keyholder restart, keyholder arm ...
  • 21:46 mutante: deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677)
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 (duration: 13m 44s)
  • 21:25 robh: setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga
  • 21:18 ppchelko@deploy1001: Started deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902
  • 20:35 XioNoX: Homer push to all cr2-eqdfw - new NTP servers, remove border-in4 term unused-ips, add (unused) BGP_Wikimedia_pops, re-order ospf interfaces
  • 20:27 shdubsh: restarting logstash on logstash1008 to test level->severity filter selector
  • 20:12 XioNoX: Homer push to all msw* - new NTP servers - T237011
  • 20:07 XioNoX: Homer push to all asw* - new NTP servers - T237011
  • 19:49 XioNoX: Homer push to eqsin
  • 19:49 mutante: rsyncing home dirs from previous gerrit server cobalt to gerrit1001
  • 19:36 fdans@deploy1001: Finished deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt (duration: 06m 53s)
  • 19:31 XioNoX: Homer push to ulsfo
  • 19:29 fdans@deploy1001: Started deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt
  • 19:08 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.4
  • 18:22 Urbanecm: Morning SWAT done
  • 18:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice: SWAT: 3e5b33f: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 00m 55s)
  • 18:20 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/CentralNotice: SWAT: 963e963: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 01m 01s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe08fbb: Undeploy reader surveys in English, Polish, and Russian (T232525) (duration: 01m 02s)
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@8ca04df]: deploying refinery (duration: 01m 09s)
  • 18:00 fdans@deploy1001: Started deploy [analytics/refinery@8ca04df]: deploying refinery
  • 16:23 bd808: Our @wikimediatech Twitter account is soft blocked pending phone number verification. bd808 trying to figure out a good way to do that verification for a bot account.
  • 16:14 jynus: restart dbprov2002 after upgrade T236924
  • 16:09 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 100%', diff saved to https://phabricator.wikimedia.org/P9513 and previous config saved to /var/cache/conftool/dbconfig/20191031-160925-jynus.json
  • 15:28 jgleeson: Updated paymentswiki from e28bc54e85 to 0de9d96208
  • 14:56 Urbanecm: Password reset for SUL user `Darth AK`
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119 at 10%', diff saved to https://phabricator.wikimedia.org/P9512 and previous config saved to /var/cache/conftool/dbconfig/20191031-145010-jynus.json
  • 14:28 jynus: reloading ferm on db1119
  • 14:24 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P9511 and previous config saved to /var/cache/conftool/dbconfig/20191031-142455-jynus.json
  • 13:40 effie: upload xdebug 2.7.0-1+wmf2 to component/php72 - T234418
  • 13:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s)
  • 13:16 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json
  • 11:48 jynus: setting pc1008 as a replica of active pc1010
  • 11:43 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s)
  • 11:37 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json
  • 11:24 Urbanecm: EU SWAT done
  • 11:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/ProofreadPage/: SWAT: e0d5ce9: Add page navigation tabs in correct order skin-side and remove js requirement for Vector tab icons (T231250); ed17da2: Makes sure that Vector default background does not override the navigation arrows (T236969) (duration: 01m 02s)
  • 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 547086|Enable ContentTranslation out of Beta in Albanian WP (T236064) (duration: 01m 02s)
  • 11:03 ema: cp5008: restart ats-be to clear "backend process restarted" alert
  • 11:00 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 godog: bounce logstash on logstash2004
  • 10:39 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:38 ema: pool cp5009 with ATS backend T227432
  • 10:37 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:35 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:30 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:29 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:19 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:13 godog: bounce logstash on logstash2004
  • 10:07 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:05 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:43 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 godog: temporarily stop logstash on logstash2005 to test performance with two ingesters only - T215904
  • 09:23 godog: temporarily stop logstash on logstash2006 to test performance with two ingesters only - T215904
  • 09:10 ema: depool cp5009 and reimage as text_ats T227432
  • 08:25 ariel@deploy1001: Finished deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation (duration: 00m 03s)
  • 08:25 ariel@deploy1001: Started deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation
  • 06:37 elukey: upgrade cergen to 0.2.5 on puppetmaster1001
  • 03:44 vgutierrez: switch from nginx to ats-tls on cp4032 - T231627
  • 03:09 vgutierrez: switch from nginx to ats-tls on cp4031 - T231627
  • 02:51 vgutierrez: switch from nginx to ats-tls on cp4030 - T231627
  • 01:41 eileen: civicrm revision changed from 0547c84f73 to d2045c6b98, config revision is 1a709a61aa (looks like patch was still hung in gerrit last time)
  • 01:34 eileen: civicrm revision is 0547c84f73, config revision is 1a709a61aa - that should stop those failmails
  • 00:40 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/WikiLove/resources/ext.wikiLove.icon.vector.css: T236958 Fix Vector icon after upstream change (duration: 01m 02s)
  • 00:38 eileen: civicrm revision changed from a55c2d2787 to 0547c84f73, config revision is 1a709a61aa

2019-10-30

  • 23:21 ejegg: updated fundraising python tools from ffc7bf764b to a93eec292d
  • 23:08 XioNoX: power cycle cr3-esams re1 - T236598
  • 22:29 mutante: scandium - live hack /srv/mediawiki/wmf-config/InitialiseSettings.php - set wmgMemoryLimit to 850 (*1024 *1024), restart php7.2-fpm (T236833)
  • 22:22 andrew@deploy1001: Finished deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode (duration: 03m 15s)
  • 22:19 andrew@deploy1001: Started deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837 (duration: 13m 54s)
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837
  • 21:31 ppchelko@deploy1001: Finished deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838 (duration: 14m 04s)
  • 21:17 ppchelko@deploy1001: Started deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838
  • 20:47 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:47 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:46 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 20:46 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:42 arlolra: Updated Parsoid to 5ac1623 (T235656, T233818, T234549, T227209, T236112)
  • 20:29 otto@deploy1001: Synchronized wmf-config/LabsServices.php: Syncing LabsServices.php change for beta eventgate instance replacement (duration: 01m 01s)
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623 (duration: 09m 10s)
  • 20:25 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 18s)
  • 20:24 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623
  • 20:17 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: WikimediaEditorTasks: Enable edit streaks on beta (duration: 01m 03s)
  • 20:11 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:11 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:10 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 51s)
  • 20:09 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:07 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 07s)
  • 20:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:06 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 23s)
  • 20:06 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 05s)
  • 20:03 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 19:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 19:06 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.4 (duration: 01m 00s)
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.4
  • 19:05 mutante: moscovium - stop and remove rsync server, purge rsync package T180641
  • 18:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T222851 Migrate to Kask for Echo seen-time storage (duration: 01m 01s)
  • 17:43 elukey: upload cergen 0.2.5-1+deb10u1 to buster-wikimedia component/cergen
  • 17:41 elukey: run reprepro clearvanished on install1002 to clean leftovers of buster-wikimedia|thirdparty/elastic7
  • 17:37 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 17:37 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 17:29 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Revert 16:05 UTC T236928 (duration: 01m 05s)
  • 17:26 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Revert 16:02 UTC T236928 (duration: 01m 04s)
  • 16:59 jynus: killed rebuildItemTerms on mwmaint1002
  • 16:05 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T234948) (duration: 01m 04s)
  • 16:02 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 01m 05s)
  • 15:48 godog: roll restart logstash after https://gerrit.wikimedia.org/r/c/operations/puppet/+/544217
  • 15:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 06s)
  • 15:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 05s)
  • 15:36 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:29 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 15:23 gehel: shutting down elastic1039 to be ready for disk swap - T236601
  • 15:10 effie: enable-puppet in mw* hosts
  • 15:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T210174 Load Wikisource extension when wmgUseWikisource is true (duration: 01m 01s)
  • 14:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236502 Define wmgUseWikisource as default-false (duration: 01m 22s)
  • 14:40 ema: pool cp5008 with ATS backend T227432
  • 14:32 effie: disable puppet on all mw* hosts
  • 14:20 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:39 andrew@deploy1001: Finished deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver (duration: 03m 38s)
  • 13:36 andrew@deploy1001: Started deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver
  • 12:59 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=cp5008.eqsin.wmnet
  • 12:58 moritzm: rolling restart of slapd to pick up LDAP schema change
  • 12:57 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet
  • 12:50 arturo: updating package versions in install1002 for thirdparty/kubeadm-k8s stretch-wikimedia (T236824)
  • 12:23 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:22 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 moritzm: temporarily disabling puppet on LDAP servers for a schema change
  • 11:42 ema: depool cp5008 and reimage as text_ats T227432
  • 11:37 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 11:31 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase rate limits for newbie non-ip users on Commons (duration: 01m 01s)
  • 11:13 Urbanecm: EU SWAT done
  • 11:12 Urbanecm: Synchronized wmf-config/InitialiseSettings.php: SWAT: 61cb77c: Re-apply: MCR: Set testwiki to use the new MCR-only schema (T198558) (duration: 00m 59s)
  • 10:07 jynus: restarting bacula-dir, bacula-sd on backup1001 T236406
  • 09:46 vgutierrez: Switch from nginx to ats-tls on cp4029 - T231627
  • 09:34 vgutierrez: Switch from nginx to ats-tls on cp4028 - T231627
  • 09:25 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 08:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 08:25 moritzm: installing php7.0 security updates
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:57 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 05:58 vgutierrez: Rolling restart of ats-tls to get rid of leaked sockets and benefit from the lower inactivity timeout - T236458
  • 04:24 vgutierrez: restarting ats-tls on cp4027 with half open disabled - T236458
  • 03:09 vgutierrez: Rolling restart of prometheus-exporter-trafficserver-tls - T236458
  • 02:40 vgutierrez: restarting ats-tls on cp3050 with half open disabled - T236458
  • 00:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php

2019-10-29

  • 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 23:09 mutante: ganeti1003 - gnt-instance remove ununpentium.wikimedia.org (T236748)
  • 23:05 Urbanecm: Evening SWAT done
  • 23:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/atjwiki* (T236777)
  • 23:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: f7b9972: Revert "Milestone lobo for atjwiki" (T236777) (duration: 01m 01s)
  • 22:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:17 mutante: ununpentium - shutdown Ganeti VM - running decom script, schedule icinga downtime (T236748)
  • 22:14 mutante: rsynced data dump and config from ununpentium to moscovium in /srv/ before shutting down the old server (T180641)
  • 20:43 papaul: rebooting cp3056 for HW check
  • 20:19 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw complete (T235654)
  • 19:42 andrew@deploy1001: Finished deploy [horizon/deploy@dbe892e]: (no justification provided) (duration: 03m 59s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@dbe892e]: (no justification provided)
  • 19:32 jynus: restarting bacula-fd on install1002 T236406
  • 19:31 andrew@deploy1001: Finished deploy [horizon/deploy@bab5d37]: (no justification provided) (duration: 01m 35s)
  • 19:30 andrew@deploy1001: Started deploy [horizon/deploy@bab5d37]: (no justification provided)
  • 19:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.4
  • 19:14 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache (duration: 21m 11s)
  • 18:54 jynus@cumin1001: dbctl commit (dc=all): 'Revert state to before overload+maintenance', diff saved to https://phabricator.wikimedia.org/P9501 and previous config saved to /var/cache/conftool/dbconfig/20191029-185438-jynus.json
  • 18:53 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache
  • 18:53 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw (T235654)
  • 18:50 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.1 (duration: 08m 09s)
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902 (duration: 14m 13s)
  • 18:07 ppchelko@deploy1001: Started deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902
  • 17:42 brennen: cutting branch for 1.35.0-wmf.4
  • 17:38 mutante: phab1001 - upgrading php7.3 packages
  • 17:34 mutante: phab2001 - upgrading PHP packages
  • 17:06 jynus@cumin1001: dbctl commit (dc=all): 'repool db1099 both instances fully to increase redundancy', diff saved to https://phabricator.wikimedia.org/P9499 and previous config saved to /var/cache/conftool/dbconfig/20191029-170648-jynus.json
  • 16:56 jynus@cumin1001: dbctl commit (dc=all): 'depool fully db1105:3311, stability/lag issues', diff saved to https://phabricator.wikimedia.org/P9498 and previous config saved to /var/cache/conftool/dbconfig/20191029-165633-jynus.json
  • 16:52 ssastry@deploy1001: Finished deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d (duration: 09m 35s)
  • 16:46 jynus@cumin1001: dbctl commit (dc=all): 'pool db1106 into s1 rcs', diff saved to https://phabricator.wikimedia.org/P9497 and previous config saved to /var/cache/conftool/dbconfig/20191029-164640-jynus.json
  • 16:43 ssastry@deploy1001: Started deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d
  • 16:39 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 16:28 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 06m 11s)
  • 16:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 16:22 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 16:20 mutante: reloading nginx on wtp*
  • 15:57 bstorm_: restarted ferm on labstore1006 -- it failed an external DNS lookup due to brief issues apparently on the other end
  • 15:25 vgutierrez: restarting ats-tls on cp5007 with a default inactivity timeout of 5 minutes and half open disabled - T236458
  • 15:04 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 15:01 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 14:58 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 14:45 robh: setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538
  • 14:32 Krinkle: krinkle@webperf1001.eqiad Restart navtiming, coal and statsv services
  • 14:29 elukey: upgrade python-kafka on webperf[12]001 - T234808
  • 14:27 Krinkle: krinkle@webperf2001 Restart navtiming, coal and statsv services
  • 12:32 hashar: Restarting Zuul / Jenkins
  • 12:31 hashar: Stopping Zuul / Jenkins for upgrade
  • 12:29 akosiaris: delete all production00 volumes on backup1001
  • 11:48 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 11:37 Urbanecm: EU SWAT done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: faeb8f1: Allow AbuseFilter to issue blocks on es.wikinews (T236730) (duration: 00m 53s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fc9920e: Rename Author talk namespace at thwikisource (T236640) (duration: 00m 56s)
  • 11:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 11:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 10:51 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jakob@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:39 jakob@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:33 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 10:29 moritzm: installing php5 security updates
  • 10:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 10:21 jynus: running import on m1-master, m1 replicas will lag for a whileT236406
  • 10:20 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 XioNoX: disable cr3-esams:et-1/0/0 (flapping)
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 gehel: plugin upgrade on relforge - T236123
  • 09:27 godog: reimage elastic 7 hw with Buster
  • 09:27 vgutierrez: restart ats-tls on cp5007 disabling TCP SO_LINGER - T236458
  • 08:43 jynus: shutting down db1099 T227538
  • 08:35 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1099', diff saved to https://phabricator.wikimedia.org/P9492 and previous config saved to /var/cache/conftool/dbconfig/20191029-083547-jynus.json
  • 08:15 XioNoX: push term allow_vmhost ro cr3-esams loopback4 filter - T236598
  • 08:06 vgutierrez: restarting ats-tls on cp5007 with TCP FASTOPEN disabled - T236458
  • 07:40 moritzm: installing php7.3 security updates
  • 07:06 elukey: roll restart java daemons on analytics1042, druid1003 and aqs1004 to pick up new openjdk upgrades
  • 07:01 _joe_: restart memcached on mc1024-1036, 1 hour apart, via cumin (T235188)
  • 06:26 _joe_: restart memcached on mc1023 T23518
  • 03:35 vgutierrez: restarting varnish-frontend on cp5008

2019-10-28

  • 23:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy Echo kask migration to officewiki for testing, part 3 (T222851) (duration: 00m 52s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy Echo kask migration to officewiki for testing, part 2 (T222851) (duration: 00m 52s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/ProductionServices.php: Deploy Echo kask migration to officewiki for testing, part 1 (T222851) (duration: 00m 54s)
  • 23:18 mutante: re-enabling puppet on moscovium (RT)
  • 22:02 ejegg: re-enabled basic fundraising jobs (Queue consumers, audit processors, TY mailer)
  • 20:56 cdanis: restart memcached on mc1022 T235188
  • 20:37 Jeff_Green: authdns update to switch fundraising db service hostname
  • 20:19 ejegg: disabled all fundraising scheduled jobs
  • 19:50 rlazarus: restarted memcached on mc1021 (T235188)
  • 19:41 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 02m 42s)
  • 19:38 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 18:53 moritzm: updating PHP on people1001
  • 18:52 Urbanecm: Morning SWAT done
  • 18:42 urbanecm@deploy1001: Synchronized wmf-config/logging.php: SWAT: 1a09e2a: Direct Parsoid/PHP logs to a parsoid-php log "type" (T235899) (duration: 00m 52s)
  • 18:41 rlazarus: restarted memcached on mc1020 T235188
  • 18:32 mutante: moscovium - rename all files in /etc/request-tracker4/RT_SiteConfig.d to have a .pm extension - this fixed RT - login works again - puppet patch coming up (T180641)
  • 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 30111f3: Enable mapframe at kawiki (T229726) (duration: 00m 53s)
  • 18:28 mutante: moscovium - deleting /etc/request-tracker4/RT_SiteConfig.d/ 50-debconf.pm and 51-dbconfig-common.pm which duplicate the same files without .pm extension with wrong values, probably due to some package change (T180641)
  • 18:27 jgleeson: updated paymentswiki from 7bb9f5257e to e28bc54e85
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: c48271d: Revert "Config changes for Echo kask migration" (T222851) (duration: 00m 53s)
  • 18:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditor.php: SWAT: b19ad5f: Revert "Revert "ApiVisualEditor: Return etag with content for preloaded content""; 4f3b724: ApiVisualEditor: Fix preload handling further (T233320) (duration: 00m 53s)
  • 18:15 Urbanecm: Run mwscript namespaceDupes.php --wiki=thwikisource --fix (T236640)
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ea927dd: Rename author NS at thwikisource (T236640) (duration: 00m 53s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: ddaa534: Config changes for Echo kask migration (T222851) (duration: 00m 55s)
  • 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:12 bblack: mr1-eqiad: fix bast3004 access for eqiad mgmt network - T236686
  • 17:11 _joe_: starting rolling restart of memcached servers in eqiad, beginning with mc1019 T235188
  • 17:11 bblack: mr1-codfw: fix bast3004 access for codfw mgmt network - T236686
  • 17:10 bblack: mr1-ulsfo: fix bast3004 access for ulsfo mgmt network - T236686
  • 16:57 bblack: mr1-eqsin: fix bast3004 access for eqsin mgmt network - T236686
  • 16:56 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:55 bblack: mr1-esams: fix bast3004 access for esams mgmt network - T236686
  • 16:36 jbond42: restart puppetdb on pupetdb1001 to remove queue
  • 13:50 ema: pool cp5007 with ATS backend T227432
  • 13:30 godog: roll restart logstash in codfw/eqiad to apply new config
  • 13:23 effie: enable puppet on mw1*, depool and repool to reload apache - T229792
  • 13:13 effie: enable puppet on mw[1261-1265].eqiad.wmnet (mw canaries), depool and repool to reload apache - T229792
  • 13:07 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 effie: enable puppet on mw2* servers, depool and repool to reload apache - T229792
  • 13:01 jynus: stop db1114 for testing
  • 12:30 ema: depool cp5007 and reimage as text_ats T227432
  • 12:22 effie: depool mw2150
  • 11:56 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001 (duration: 00m 05s)
  • 11:56 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001
  • 11:34 Urbanecm: EU SWAT done
  • 11:33 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: 8caf681: Dont log missing ETags when creating a new page, thats normal (T233320) (duration: 00m 54s)
  • 11:33 effie: Disable puppet on mw* for 545652 - T229792
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dd2f06c: Add Translate channel for the Translate extension (T221119) (duration: 00m 53s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ff17666: Adjust wgUploadNavigationUrl for azwiki to point to commons UpWiz (T236307) (duration: 00m 53s)
  • 11:05 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 7e26ef4: Revert "Restrict uploads on azwiki" (T236307) (duration: 00m 53s)
  • 11:02 moritzm: installing OpenJDK security updates on elastic*
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 08:48 godog: bump udp_localhost kafka-logging topics to 6 partitions and roll-restart logstash and rsyslog - T215904
  • 08:26 volans: manually cleanup changes reverted in https://gerrit.wikimedia.org/r/546407 on icinga[12]001 - T222074
  • 08:25 moritzm: installing file/libmagic security updates
  • 08:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791 (duration: 13m 42s)
  • 08:15 godog: swift eqiad-prod: final weight to ms-be105[1-6] - T232367
  • 08:02 mobrovac@deploy1001: Started deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389 (duration: 13m 44s)
  • 07:40 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt (duration: 00m 05s)
  • 07:40 elukey@deploy1001: Started deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt
  • 07:37 elukey: upload archiva 2.2.4-1 to wikimedia-stretch (fix to avoid overriding archiva.xml upon install)
  • 07:27 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389
  • 07:25 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org (duration: 02m 37s)
  • 07:22 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org

2019-10-26

  • 11:30 XioNoX: restart cr3-esams
  • 11:01 XioNoX: re0.cr3-esams> request chassis routing-engine master switch

2019-10-25

  • 22:55 mutante: moscovium rm /dev/shm/envoy_shared_memory_0 to revive envoy which failed to run after changing ports and reinstalling it (T180641)
  • 22:42 mutante: moscovium - manually deleting envoy listener on 1443 and letting puppet recreate config because it's not removed if you change the port (T180641)
  • 21:55 mutante: running puppet on ulsfo cp-ats servers to pick up config change for RT backend
  • 20:42 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes (duration: 00m 06s)
  • 20:41 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: test deploy design/style-guide (duration: 00m 10s)
  • 20:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: test deploy design/style-guide
  • 17:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 bblack: lvs3005 - reimaging to fix partman issue, high-traffic1 (text) to lvs3007 for the duration
  • 16:43 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 bblack: lvs3006 - reimaging to fix partman issue, high-traffic2 (upload/maps) to lvs3007 for the duration
  • 16:19 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292 (duration: 13m 31s)
  • 16:05 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292
  • 16:04 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292 (duration: 00m 43s)
  • 16:04 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292
  • 15:35 robh: ps1-oe14-esams ip info set, rebooting (wont affect servers) via T184066
  • 15:03 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 15:01 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:00 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 bblack: cr[23]-esams: re-route ns2 IP to ganeti3003
  • 14:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292 (duration: 00m 44s)
  • 14:31 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292
  • 14:30 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292 (duration: 00m 05s)
  • 14:30 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292
  • 14:28 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292 (duration: 01m 02s)
  • 14:27 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292
  • 14:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:10 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:09 bblack: reboot ganeti3003
  • 13:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 ema: pool cp4032 with ATS backend T227432
  • 13:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 effie: depool mw1334 and pool back
  • 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4032.ulsfo.wmnet,service=ats-be
  • 13:05 ema: depool cp4032 and reimage as text_ats T227432
  • 12:34 jynus: introducing new freshnesh check for bacula T234900
  • 12:11 ema: pool cp4031 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:59 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4031.ulsfo.wmnet,service=ats-be
  • 09:56 ema: depool cp4031 and reimage as text_ats T227432
  • 09:39 ema: pool cp4030 with ATS backend T227432
  • 09:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 XioNoX: powering off mr1-esams again
  • 09:20 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 XioNoX: going to power down mr1-esams (esams mgmt is going to go down) for 30min the time to move power cables
  • 09:02 jynus: disabling persistent journald on db1074
  • 09:01 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4030.ulsfo.wmnet,service=ats-be
  • 08:58 ema: depool cp4030 and reimage as text_ats T227432
  • 08:48 vgutierrez: switch from nginx to ats-tls on cp3050 - T231627
  • 08:45 godog: stop prometheus on bast300[24] and done last round of rsync data - T236329
  • 08:37 ema: lvs1015: restart pybal to add labweb-ssl T210411
  • 08:36 ema: test
  • 08:34 ema@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
  • 08:32 ema: lvs1016: restart pybal to add labweb-ssl T210411
  • 08:02 vgutierrez: rolling restart of ats-tls to introduce a SSL handshake timeout of 60 secs - T236458
  • 07:35 akosiaris: reboot webperf1002 for disk resize T235455
  • 07:29 akosiaris: reboot webperf2002 for disk resize T235455
  • 05:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:35 vgutierrez: reimage lvs3007 to let it get the proper partman configuration - T236294
  • 05:03 vgutierrez: Applying a SSL handshake timeout of 60 secs on ats-tls/cp5007 - T236458
  • 04:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:55 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:53 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:52 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:50 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:49 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:24 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns3001.*
  • 03:08 bblack: cr2-esams + cr3-esams : remove nescio and maerlant from anycast4 neighbor list
  • 03:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 03:05 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3049.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 02:44 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3043.esams.wmnet
  • 02:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 02:09 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 01:52 bblack: mr1-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:50 bblack: asw2-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:46 bblack: cr2-esams + cr3-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 01:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3047.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3046.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 01:13 mutante: puppetmaster1001 - revoking parsoid.svc.eqiad / parsoid.svc.codfw / parsoid.discovery.wmnet certificates and creating new ones including parsoid-php.discovery.wmnet (T233654)
  • 00:52 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/LiquidThreads/classes/View.php: (no justification provided) (duration: 00m 54s)

2019-10-24

  • 23:46 mutante: bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329)
  • 23:24 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s)
  • 22:16 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:13 mutante: gerrit1001 - starting gerrit
  • 22:13 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 thcipriani: stopping gerrit briefly for script run for T236344
  • 22:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:01 mutante: mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792)
  • 21:56 eileen: civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1
  • 21:16 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
  • 21:16 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 21:12 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet
  • 21:06 bblack: cr3-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 bblack: cr2-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 urandom: restbase cassandra rolling restart, codfw / rack 'd' -- T200803
  • 21:02 bblack: downtimed lvs3001-4, stopping pybal there, etc...
  • 20:58 bblack: cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:58 bblack: cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:40 bblack: esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently)
  • 20:36 bblack: esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again)
  • 20:33 urandom: restbase cassandra rolling restart, codfw / rack 'c' -- T200803
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
  • 20:23 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet
  • 20:22 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 20:04 bblack: reboot cp3054 again for good measure
  • 19:57 bblack: cp3054 - trying racadm serveraction hardreset
  • 19:32 bblack: reboot dns3001
  • 19:31 urandom: restbase cassandra rolling restart, codfw / rack 'b' -- T200803
  • 19:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:06 urandom: restbase cassandra rolling restart, rack 'd' -- T200803
  • 19:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:59 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:57 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 Urbanecm: Morning SWAT done
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 urandom: restbase cassandra rolling restart, rack 'b' -- T200803
  • 18:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:31 bblack: cr3-esams: add dns3001 to anycast4 neighbors
  • 18:30 bblack: cr2-esams: add dns3001 to anycast4 neighbors
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 263fd0f: Enable Wikibase client access on commonswiki (T223792) (duration: 00m 52s)
  • 18:25 urandom: restbase cassandra rolling restart, rack 'a' -- T200803
  • 18:22 robh: completing ps1-b6-eqiad setup, pdu will reboot twice, power output unaffected T227540
  • 18:20 robh: ps1-a6-eqiad setup complete, icinga errors should clear up T227142
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 84c48df: rename service definition (T222851) (duration: 00m 53s)
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b20d6de: Reference Previews: full beta deployment (T235083) (duration: 00m 52s)
  • 18:03 robh: setting ip info for ps1-a6-eqiad, it is rebooting. T227142
  • 17:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 ema: pool cp3059 (cache_upload) T233242
  • 17:29 bblack: asw2-esams - committing switch port/vlan config for new rack 14 hosts
  • 17:26 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable Parsoid/PHP in the whole wtp (a.k.a. Parsoid) cluster - T236388 (duration: 00m 53s)
  • 17:18 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:54 ema: depool cp3036 (cache_upload) T233242
  • 16:39 urandom: restarting cassandra, restbase2011 (canary for config changes) -- T200803
  • 16:32 urandom: restarting cassandra, restbase1016 (canary for config changes) -- T200803
  • 16:28 ema: depool cp3035 (cache_upload) T233242
  • 16:07 ema: pool cp3057 (cache_upload) T233242
  • 15:51 ema: depool cp3032 (cache_text) T233242
  • 15:45 ema: depool cp3034 (cache_upload) T233242
  • 15:40 ema: depool cp3030 (cache_text) T233242
  • 15:27 bblack: asw2-esams: configure port descriptions and vlan/lvs groupings for all rack16 hosts (lvs3007, ganeti3003, bast3004, cp3061-5)
  • 15:19 ema: pool cp3058 (cache_text) T233242
  • 15:18 effie: Slowly reload apache across the fleet (as we are enabling puppet) - T229792
  • 15:09 effie: Remove hhvm packages and enable puppet across the fleet - T229792
  • 15:09 ema: pool cp3055 (cache_upload) T233242
  • 15:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testcommonswiki, Enable Wikibase client access T223792 (duration: 00m 53s)
  • 15:00 bblack: cr2-esams - add missing lvs3005 IP to bgp pybal neighbor list
  • 14:58 bblack: cr3-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:58 bblack: cr2-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:47 effie: run puppet on all canaries and codfw - T229792
  • 14:42 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:40 effie: Remove hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from all canaries and codfw - T229792
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:26 bblack: lvs3006 (upload, becoming active) - manual pybal med s/90/0/ (will take over from lvs3002, intended permanently).
  • 14:23 bblack: lvs3006 (upload, inactive) - manual pybal med s/100/90/ (preferred to lvs3004 for fallback from lvs3002)
  • 14:22 effie: enable puppet on mw app canaries
  • 14:16 ema: power-cycle cp3056, stuck rebooting into d-i T233242
  • 13:59 ema: pool cp3060 T233242
  • 13:36 bblack: re-pooling esams in dns
  • 13:34 effie: enable puppet on mwdebug*
  • 13:25 XioNoX: enable transit4/6 on cr2-knams
  • 13:24 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=varnish-be,name=cp30[56].*
  • 13:24 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp30[56].*,service=varnish-be
  • 13:23 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=nginx
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=nginx
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3063.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3051.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3059.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3061.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3057.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3065.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3055.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3053.esams.wmnet
  • 13:17 ema: set ats-be weights on new esams upload nodes T233242
  • 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.3
  • 12:56 effie: purge hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from mw* canaries - T229792
  • 12:42 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp3060.esams.wmnet,service=varnish-be
  • 12:33 effie: Stopping puppet on all hosts including the hhvm class (C:hhvm) - 544864 - T229792
  • 12:25 ema: cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242
  • 12:14 bblack: depool esams in geodns
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 after analyze table', diff saved to https://phabricator.wikimedia.org/P9468 and previous config saved to /var/cache/conftool/dbconfig/20191024-120812-marostegui.json
  • 12:06 XioNoX: shutdown cr1-esams - cr2-knams link
  • 12:00 XioNoX: shutdown transit BGP sessions on cr2-knams
  • 11:40 Urbanecm: EU SWAT done
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3a5cb68: Permission changes of move-rootuserpages assignment at commonswiki (T236359) (duration: 01m 00s)
  • 11:33 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:31 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 Urbanecm: Run mwscript namespaceDupes.php --wiki=commonswiki --add-prefix=FIXME --fix (T236352)
  • 11:28 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e079956: Add CAT as alias for NS_CATEGORY at commonswiki (T236352) (duration: 01m 00s)
  • 11:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 2d66deb: Restrict uploads on azwiki (T236307) (duration: 01m 03s)
  • 11:15 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/WikibaseMediaInfo: Also use custom PrefetchingTermLookup in SingleEntitySourceServices (duration: 01m 01s)
  • 11:13 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Allow defining entity-type-specific PrefetchingTermLookup (duration: 01m 06s)
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights for db1093 and db1085', diff saved to https://phabricator.wikimedia.org/P9466 and previous config saved to /var/cache/conftool/dbconfig/20191024-101810-marostegui.json
  • 09:59 hashar: Converting CI jobs to use the new PostBuildScript plugin config | https://gerrit.wikimedia.org/r/#/c/integration/config/+/544907/ | T188398
  • 09:57 hashar: Restarting CI Jenkins
  • 09:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T234853 Re-enable performance perception survey on ruwiki (duration: 01m 04s)
  • 08:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:36 godog: roll restart rsyslog in codfw/eqiad to pick up new kafka partitions
  • 08:18 godog: roll restart rsyslog in ulsfo/esams/eqsin to pick up new kafka partitions
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092 for analyze table', diff saved to https://phabricator.wikimedia.org/P9465 and previous config saved to /var/cache/conftool/dbconfig/20191024-081519-marostegui.json
  • 07:57 XioNoX: reboot mr1-esams
  • 07:42 godog: bump rsyslog- topics partitions to 6 and roll-restart logstash frontends
  • 07:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:22 XioNoX: drain Telia link on cr2-esams
  • 06:32 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid-php,name=eqiad
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9463 and previous config saved to /var/cache/conftool/dbconfig/20191024-052002-marostegui.json
  • 05:18 marostegui: Run analyze enwiki.revision on db2092 T223151
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9462 and previous config saved to /var/cache/conftool/dbconfig/20191024-045954-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from special slaves group and leave it with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9461 and previous config saved to /var/cache/conftool/dbconfig/20191024-045924-marostegui.json
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9460 and previous config saved to /var/cache/conftool/dbconfig/20191024-045544-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:55 shdubsh: temporarily turn down accept delay on fermium - T235983
  • 00:03 mutante: restarting gerrit to increase heap_size from 20G to 32G (T225166 T222391)

2019-10-23

  • 22:55 brennen@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/AbuseFilter: SWAT: Unbreak filter edit form (T236286) (duration: 01m 05s)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:20 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 05s)
  • 22:19 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:15 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 01m 10s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:00 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:00 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 21:32 mutante: webperf1002/2002 - starting bacula-fd service that is failed after initial puppet run turning them into backup::hosts
  • 21:14 ejegg: updated Fundraising python tools from b3c7453be2 to ffc7bf764b
  • 20:37 shdubsh: restart nagios-nrpe-server on stat1007
  • 18:56 milimetric@deploy1001: Finished deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts (duration: 07m 53s)
  • 18:49 milimetric@deploy1001: Started deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts
  • 18:29 mforns@deploy1001: Finished deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59 (duration: 06m 40s)
  • 18:22 mforns@deploy1001: Started deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59
  • 17:31 akosiaris: restart varnish-be on cp1089 as a response to HTTP availability alerts. High mailbox lag
  • 17:25 akosiaris: restart varnish-be on cp1081 as a response to HTTP availability alerts
  • 15:55 _joe_: restarting pybal on lvs2006, then 2003 for picking up parsoid-php
  • 15:32 marostegui: Enable slow query log 1/20 on db1089 (enwiki) T223151
  • 14:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:39 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:38 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:37 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:35 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:19 bblack: repooling esams
  • 14:00 hashar: Restarting CI Jenkins
  • 13:57 _joe_: manually changing the symlinked deployed version of parsoid on wtp1025 T236275
  • 13:35 XioNoX: migrate esams mgmt to new mgmt router
  • 13:34 effie: disable puppet on mwdebug1002 - T214734
  • 13:13 ssastry@deploy1001: Finished deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues (duration: 08m 44s)
  • 13:07 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.3 (duration: 01m 00s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.3
  • 13:04 ssastry@deploy1001: Started deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues
  • 12:37 effie: Depool mwdebug1002 - T214734
  • 12:31 vgutierrez: restarting ats-tls on cache text nodes - T233274
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from the special slaves group on s5 and leave it back with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9454 and previous config saved to /var/cache/conftool/dbconfig/20191023-122708-marostegui.json
  • 11:26 XioNoX: powering down cr1-esams
  • 11:24 Urbanecm: EU SWAT done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: e21054e: Add Balinese to interwiki sort orders (T234768) (duration: 01m 01s)
  • 11:18 Urbanecm: mwscript updateArticleCount.php --wiki=frwikiquote --update (T236212)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (2/2; T234278) (duration: 01m 01s)
  • 11:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (1/2; T234278) (duration: 01m 01s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf8e2f1: Set $wgArticleCountMethod to any for frwikiquote (T236212) (duration: 01m 12s)
  • 10:46 ema: cp-ats: rolling ATS backend restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545522/ T233274
  • 10:13 jynus: reverting dbtree revision to HEAD~1 T224589
  • 10:11 jynus: deploying new version of dbtree T224589
  • 10:04 ema: cp1075: ats-backend-restart to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545508/
  • 09:42 godog: bounce burrow-logging-eqiad.service on kafkamon1001
  • 09:40 godog: roll restart logstash to pick up new rsyslog-notice partitions
  • 09:31 godog: bump rsyslog-notice topic to 6 partitions
  • 09:00 moritzm: rebooting logstash2021 for some firmware tests
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 moritzm: installing systemd bugfix update on mw canaries
  • 08:50 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 godog: roll restart rsyslog on cirrus and wqds hosts to pick up changes to logback topic partitions
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312 after table compression', diff saved to https://phabricator.wikimedia.org/P9452 and previous config saved to /var/cache/conftool/dbconfig/20191023-082826-marostegui.json
  • 08:23 godog: roll restart logstash in codfw/eqiad to pick up new kafka partitions
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9451 and previous config saved to /var/cache/conftool/dbconfig/20191023-082246-marostegui.json
  • 08:11 godog: kafka-logging eqiad set 12 partitions for ^mwlog- ^logback- and eqiad.client.error topics
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9450 and previous config saved to /var/cache/conftool/dbconfig/20191023-080857-marostegui.json
  • 07:55 godog: kafka-logging delete unused topic syslog-notice
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9449 and previous config saved to /var/cache/conftool/dbconfig/20191023-075106-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9448 and previous config saved to /var/cache/conftool/dbconfig/20191023-074828-marostegui.json
  • 07:46 XioNoX: powering down cr2-esams for relocation (for real this time)
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9447 and previous config saved to /var/cache/conftool/dbconfig/20191023-073831-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9446 and previous config saved to /var/cache/conftool/dbconfig/20191023-073556-marostegui.json
  • 07:30 XioNoX: powering down cr2-esams for relocation
  • 07:28 hashar: logstash: refreshing index fields for logstash-* indices (via https://logstash.wikimedia.org/app/kibana#/management/kibana/indices/logstash-* ) # T234564
  • 07:05 XioNoX: redirect ns2 to eqiad - T235805
  • 07:04 marostegui: Enable slow query log 1/10 on db1089 (enwiki) T223151
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:59 XioNoX: depool esams - T235805
  • 06:57 effie: Depooling mw1317
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:38 marostegui: Compress tables on db1097:3315 T235599
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9445 and previous config saved to /var/cache/conftool/dbconfig/20191023-063800-marostegui.json
  • 05:29 ema@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kibana,name=codfw
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9444 and previous config saved to /var/cache/conftool/dbconfig/20191023-052940-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9443 and previous config saved to /var/cache/conftool/dbconfig/20191023-050812-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9442 and previous config saved to /var/cache/conftool/dbconfig/20191023-045722-marostegui.json
  • 04:49 vgutierrez: repool cp5007 - T234887
  • 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9441 and previous config saved to /var/cache/conftool/dbconfig/20191023-044833-marostegui.json
  • 04:36 MaxSem: Fixed a page title via namespaceDupes.php on pswiki
  • 03:51 vgutierrez: depool cp5007 - T234887

2019-10-22

  • 23:57 maxsem@deploy1001: Synchronized php-1.35.0-wmf.3/includes/block/DatabaseBlock.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/545373/ (duration: 00m 59s)
  • 23:53 maxsem@deploy1001: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543943/ (duration: 01m 01s)
  • 23:43 maxsem@deploy1001: Synchronized dblists/: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 00m 59s)
  • 23:41 maxsem@deploy1001: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 01s)
  • 23:38 maxsem@deploy1001: Synchronized dblists/labtestwiki.dblist: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 02s)
  • 23:32 mutante: LDAP - added keepit-ssh to wmf group (T236209)
  • 22:23 ejegg: updated Fundraising CiviCRM from ff69d64ad4 to 47e0800001
  • 21:57 thcipriani: stopping gerrit to run ref-update script T236114
  • 21:57 thcipriani: stopping gerrit to run ref-update script
  • 21:45 mutante: LDAP - added lexnasser to nda group (T235688)
  • 21:07 eileen: process-control config revision is 95ee1bafb3 dedupe job re-enabled
  • 20:09 mutante: gerrit1001 - mkdir /srv/gerrit/cobalt/git - rsyncing /srv/gerrit/git from cobalt to /srv/gerrit/cobalt/git/ on gerrit1001 (T236114)
  • 19:42 hashar: gerrit1001: apt install colordiff # T236114
  • 19:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.3
  • 19:03 brennen: proceeding with train for 1.35.0-wmf.3
  • 18:09 mutante: DNS - added new Wikipedia language "mnw" (Mon) T235739 - a language spoken in Myanmar
  • 17:59 sbassett: Uploaded and applied (but did not deploy per releng) security fix for T234450 to wmf.3
  • 17:57 sbassett: Deployed security fix for T234450 to wmf.2
  • 17:57 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213) (duration: 05m 14s)
  • 17:54 mutante: restarting gerrit to disable jgit gc (T236114)
  • 17:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213)
  • 17:37 arlolra: Updated Parsoid to cf01d91 (T234057, T234768, T235296, T235684, T235563)
  • 17:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91 (duration: 07m 37s)
  • 17:20 bblack: geodns: re-pooling esams (at this point, we're entirely back in our "normal" state of affairs)
  • 17:19 arlolra@deploy1001: Started deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91
  • 16:51 bblack: geodns: moving all "normal" eqiad traffic back to eqiad (in addition to the esams-diverted traffic which is still pointed mostly at eqiad right now)
  • 16:21 mutante: running puppet on deployment servers
  • 16:20 thcipriani: restarting gerrit
  • 16:14 thcipriani: stopping gerrit to run a fix for T222391
  • 15:58 bblack: depooling esams temporarily to test traffic scenario on lvs1014
  • 15:47 bblack: enable pybal+puppet on rebooted lvs1014
  • 15:40 bblack: rebooting lvs1014
  • 15:28 liw@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache (duration: 37m 39s)
  • 15:26 XioNoX: repool esams
  • 15:20 XioNoX: rollback ns2 redirect
  • 15:13 bblack: re-disabling lvs1014 ...
  • 15:10 bblack: re-enabling lvs1014 pybal/puppet
  • 15:03 moritzm: rebooting kafka-main1005 for microcode debugging
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:52 bblack: stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016)
  • 14:50 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache
  • 14:45 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0 (duration: 02m 44s)
  • 14:42 mbsantos@deploy1001: Started deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0
  • 14:13 XioNoX: restart asw-esams for onsite work
  • 13:52 andrewbogott: restarted slapd on ldap-eqiad-replica01
  • 13:38 gehel: silencing LVS check for katotherian (we know there is an issue) - T236163
  • 13:35 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_2419219323" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 06m 40s)
  • 13:28 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.3 and rebuild l10n cache
  • 13:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 XioNoX: depool esams for onsite work - T235805
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3316 db1105:3311 db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9434 and previous config saved to /var/cache/conftool/dbconfig/20191022-130556-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9433 and previous config saved to /var/cache/conftool/dbconfig/20191022-125435-marostegui.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9432 and previous config saved to /var/cache/conftool/dbconfig/20191022-124607-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3316 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9431 and previous config saved to /var/cache/conftool/dbconfig/20191022-123757-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3312 and db1105:3311 after on-site maintenance T235877', diff saved to https://phabricator.wikimedia.org/P9430 and previous config saved to /var/cache/conftool/dbconfig/20191022-123257-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315', diff saved to https://phabricator.wikimedia.org/P9429 and previous config saved to /var/cache/conftool/dbconfig/20191022-123032-marostegui.json
  • 12:29 moritzm: rebooting miscweb2001 for some microcode tests
  • 12:28 marostegui: Compress db1096:3315
  • 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 after PDU maintenance T227142 (duration: 00m 50s)
  • 12:15 jynus: reimage to buster dbmonitor2001.wikimedia.org T224589
  • 11:57 liw: starting to cut branch for train 1.35-wmf.3
  • 11:51 hashar: Restarted CI Jenkins on contint1001
  • 11:35 marostegui: Stop MySQL on db1105:3311, db1105:3312 for firmware upgrade - T235877
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312 for firmware upgrade T235877', diff saved to https://phabricator.wikimedia.org/P9428 and previous config saved to /var/cache/conftool/dbconfig/20191022-113437-marostegui.json
  • 11:29 Urbanecm: EU SWAT done
  • 11:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor/: SWAT: 2bc4420 (T235707); 680a98b (T233320); d83265d (T234564) (duration: 00m 53s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0593f34: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections (T230614) (duration: 00m 54s)
  • 10:55 moritzm: rebooting rpki2001 for some microcode tests
  • 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:37 ema@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kibana
  • 10:32 jynus: shutting down db1115 in preparation for PDU maintanance, this will make tendril and dbtree unavailable for 2 hours T227142
  • 10:21 ema: lvs2003: restart pybal to add new service kibana-ssl T210411
  • 10:18 ema: lvs1015: restart pybal to add new service kibana-ssl T210411
  • 10:14 ema: puppetmaster1001: rm /var/run/confd-template/.kibana-ssl*.err to make confd icinga check happy T210411
  • 10:02 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=kibana-ssl
  • 09:54 ema: lvs2006: restart pybal to add new service kibana-ssl T210411
  • 09:54 ema: lvs1016: restart pybal to add new service kibana-ssl T210411
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9425 and previous config saved to /var/cache/conftool/dbconfig/20191022-091327-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9424 and previous config saved to /var/cache/conftool/dbconfig/20191022-091051-marostegui.json
  • 08:05 marostegui: Stop MySQL on labsdb1012 for PDU work T227142
  • 07:53 marostegui: Stop MySQL on db1116 pc1007 db1096:3315, db1096:3316 for PDU maintenance T227142
  • 07:18 moritzm: installing tcpdump security updates
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1010 T227142 (duration: 00m 52s)
  • 06:32 vgutierrez: rolling restart of ats-tls - T233274 T234803
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9423 and previous config saved to /var/cache/conftool/dbconfig/20191022-055151-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1070 from config T235464', diff saved to https://phabricator.wikimedia.org/P9422 and previous config saved to /var/cache/conftool/dbconfig/20191022-054759-marostegui.json
  • 05:41 marostegui: Stop mysql on db1070 - T235464
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1070 from config T235464 (duration: 00m 51s)
  • 05:40 marostegui: Remove db1070 from tendril and zarcillo - T235464
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1070 from config T235464 (duration: 00m 53s)
  • 05:33 vgutierrez: Switch from nginx to ats-tls on cp1090 - T231433
  • 05:24 vgutierrez: repooling cp2025 - T231433
  • 05:20 vgutierrez: depooling cp2025 to fix ATS/nginx configuration - T231433
  • 05:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:08 vgutierrez: Switch from nginx to ats-tls on cp1088 - T231433
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9421 and previous config saved to /var/cache/conftool/dbconfig/20191022-050204-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9420 and previous config saved to /var/cache/conftool/dbconfig/20191022-050048-marostegui.json
  • 04:58 vgutierrez: Switch from nginx to ats-tls on cp2026 - T231433
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp2024 - T231433
  • 04:18 vgutierrez: Switch from nginx to ats-tls on cp3049 - T231433
  • 03:44 vgutierrez: Switch from nginx to ats-tls on cp3047 - T231433
  • 01:12 eileen: disabled dedupe job pending T236096 deploy
  • 01:12 eileen: process-control config revision is 782a14c7d9

2019-10-21

  • 23:15 thcipriani: ops/puppet:sudo -u gerrit2 git update-ref refs/changes/66/535966/meta d6909e0 && sudo -u gerrit2 git update-ref refs/changes/66/535966/meta 8494c28 on gerrit1001
  • 23:11 mutante: rsynced operations/puppet.git/objects from cobalt to gerrit1001 (and backup in /root) (T222391)
  • 22:23 mutante: mw1340 - restarting php7.2-fpm, restarting apache2
  • 21:27 mutante: gerrit1001 manually running command from "list_mediawiki_extensions" cron (T222391)
  • 21:26 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b 30 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 21:23 thcipriani: ssh -p 29418 gerrit.wikimedia.org -- gerrit index start changes --force
  • 21:21 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2, ran puppet again. gerrit back up (T222391)
  • 21:18 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2
  • 21:16 cdanis: previous cumin invocation was to unblock gerrit migration; will be automatically restored to usual on next puppet run. T222391
  • 21:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin A:dns-auth 'perl -p -i".bak" -e "s/gerrit\./gerrit-replica./" /etc/wikimedia-authdns.conf'
  • 20:57 mutante: running puppet on gerrit1001
  • 20:57 thcipriani: running puppet on cobalt
  • 20:52 mutante: rsyncing gerrit-data/plugins and /var/lib/gerrit2/review_site/ again
  • 20:51 mutante: rsyncing gerrit-data/git again
  • 20:50 thcipriani: stopping gerrit on cobalt
  • 20:44 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 52s)
  • 20:37 mutante: disabled puppet on cobalt and gerrit2001
  • 20:29 mutante: running puppet on dbproxy10017 to apply ferm change for gerrit db from gerrit1001 (T222391)
  • 20:25 mutante: gerrit1001 - puppet agent disabled - gerrit service stopped
  • 20:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f (duration: 06m 02s)
  • 20:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f
  • 20:12 mutante: rsyncing /var/lib/gerrit2/review_site from cobalt to gerrit1001 (T222391)
  • 20:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/545027/ T235949 (duration: 00m 52s)
  • 20:08 mutante: rsynced /srv/gerrit/plugins from cobalt to gerrit1001 (T222391)
  • 20:08 mutante: rsynced /srv/gerrit/git from cobalt to gerrit1001 (T222391)
  • 18:43 Urbanecm: Morning SWAT done
  • 18:41 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor: SWAT: a4ab456: TreeModifier: Ignore removed nodes properly when normalizing from a text node (T235959); ecb4532: Update VE core submodule to a4ab456dc0 (T235959); a850cee: ApiVisualEditor: Always return etag with content (T233320) (duration: 00m 55s)
  • 18:32 robh: ps1-23-ulsfo back online, all pdu work in ulsfo is now complete T235911
  • 18:30 robh: ps1-22-ulsfo repaired (reseating its NIC rebooted its mgmt interface) Done with it and repeating on ps1-23-ulsfo via T235911
  • 18:24 robh: working on ps1-22-ulsfo via T235911 (it may flap but it is already ack'd as down in icinga, but not persistent)
  • 17:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@75c0577]: GUI Updates (duration: 11m 37s)
  • 17:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/: Update VisualEditor for set of back-ports in wmf.1 T233320, T234564, T235959 (duration: 00m 56s)
  • 17:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@75c0577]: GUI Updates
  • 14:16 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.2 refs T233850
  • 13:46 Urbanecm: Deploy sec patch for T104807
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3314 and db2091:3312 for table compression', diff saved to https://phabricator.wikimedia.org/P9412 and previous config saved to /var/cache/conftool/dbconfig/20191021-132633-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9411 and previous config saved to /var/cache/conftool/dbconfig/20191021-132440-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9410 and previous config saved to /var/cache/conftool/dbconfig/20191021-132145-marostegui.json
  • 13:07 ema: lvs1015: restart pybal to add new service wdqs-ssl T210411
  • 13:04 marostegui: Deploy schema change on db1122 (s2 primary master) - T233135 T234066
  • 13:04 ema: lvs2003: restart pybal to add new service wdqs-ssl T210411
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312 after schema change and remove db1129 from vslow and dump as it was was there temporarily', diff saved to https://phabricator.wikimedia.org/P9409 and previous config saved to /var/cache/conftool/dbconfig/20191021-130355-marostegui.json
  • 13:02 ema: lvs1016: restart pybal to add new service wdqs-ssl T210411
  • 13:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wdqs-ssl
  • 12:58 ema: lvs2006: restart pybal to add new service wdqs-ssl T210411
  • 12:38 hashar: Started zuul-merger on contint2001
  • 12:32 hashar: Stopped zuul-merger on contint2001
  • 12:31 hashar: Started zuul-merger on contint1001
  • 12:16 hashar: Stopped zuul-merger on contint1001
  • 12:02 Urbanecm: EU SWAT finally done
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e8d70c1: Partial cleanup of InitialiseSettings (T231178) (duration: 01m 00s)
  • 12:00 Urbanecm: I'm going to do one last sync for EU SWAT
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 12e3549: Create Portal namespace for sawikisource (T235343) (duration: 00m 59s)
  • 11:55 urbanecm@deploy1001: sync-file aborted: SWAT: 12e3549: Create Portal namespace for sawikisource (duration: 00m 01s)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3b1350b: wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there (T235904) (duration: 00m 59s)
  • 11:49 Urbanecm: Reopen EU SWAT
  • 11:42 awight: EU SWAT complete
  • 11:42 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Put reference previews back into beta mode on beta cluster (T233813) (duration: 01m 00s)
  • 11:38 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 543764|Enable ContentTranslation out of Beta in Malayalam/Bengali/Mongolian WPs (T233008, T233009, T234317) (duration: 01m 00s)
  • 11:34 moritzm: installing Java security updates on restbase-dev1004
  • 11:30 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/tests/phpunit/includes/Storage/SqlBlobStoreTest.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 3/3 - T235188 (duration: 01m 00s)
  • 11:28 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 2/3 - T235188 (duration: 00m 59s)
  • 11:25 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/Storage/SqlBlobStore.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 1/3 - T235188 (duration: 01m 00s)
  • 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:19 hashar: contint1001 / contint2001 : marking integration/config zuul merger repo readonly: sudo chown -R root:root /srv/zuul/git/integration/config
  • 10:13 hashar: CI in trouble due to a huge number of changes
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 Amir1: maintenance script is done
  • 09:35 moritzm: removing PHP 7.0 from deployment servers
  • 09:20 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T234774)
  • 09:18 moritzm: installing php7.0 security updates
  • 09:11 moritzm: installing subversion updates on Stretch (fixes compatibility with security fix for Apache update)
  • 09:07 moritzm: installing jackson-databind security updates
  • 09:01 moritzm: installing openjpeg2 security updates
  • 08:52 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/544209
  • 08:34 Urbanecm: Deploy security patch (T234862)
  • 08:34 vgutierrez: Switch from nginx to ats-tls on cp2022 - T231627
  • 08:30 ema: pool cp4029 with ATS backend T227432
  • 08:20 vgutierrez: Switch from nginx to ats-tls on cp2020 - T231627
  • 08:09 vgutierrez: Switch from nginx to ats-tls on cp2018 - T231627
  • 08:08 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 08:03 godog: swift codfw-prod: final weight to ms-be205[1-6] - T233638
  • 07:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:57 vgutierrez: Switch from nginx to ats-tls on cp3046 - T231627
  • 07:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:50 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4029.ulsfo.wmnet,service=ats-be
  • 07:45 moritzm: installing aspell security updates on jessie
  • 07:43 vgutierrez: Switch from nginx to ats-tls on cp3045 - T231627
  • 07:35 moritzm: installing openjdk-11 security updates
  • 07:32 ema: depool cp4029 and reimage as text_ats T227432
  • 07:15 vgutierrez: Switch from nginx to ats-tls on cp1075 - T231627
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool non partitioned db1089 into s1 special slaves to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9406 and previous config saved to /var/cache/conftool/dbconfig/20191021-070655-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9405 and previous config saved to /var/cache/conftool/dbconfig/20191021-070352-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9404 and previous config saved to /var/cache/conftool/dbconfig/20191021-070119-marostegui.json
  • 06:59 vgutierrez: Switch from nginx to ats-tls on cp2001 - T231627
  • 06:46 vgutierrez: Switch from nginx to ats-tls on cp3030 - T231627
  • 06:28 vgutierrez: Install python3-cryptography-2.6.1-3+deb10u2 on acme-chief hosts - T234131
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9403 and previous config saved to /var/cache/conftool/dbconfig/20191021-061518-marostegui.json
  • 06:12 vgutierrez: Switch cp1086 from nginx to ats-tls - T231433
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1130 on s5 to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9402 and previous config saved to /var/cache/conftool/dbconfig/20191021-055843-marostegui.json
  • 05:54 vgutierrez: Switch cp2017 from nginx to ats-tls - T231433
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9401 and previous config saved to /var/cache/conftool/dbconfig/20191021-055017-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2048 and db2061, those hosts will be decommissioned T228258', diff saved to https://phabricator.wikimedia.org/P9400 and previous config saved to /var/cache/conftool/dbconfig/20191021-054340-marostegui.json
  • 05:42 _joe_: slowly removing service objects from production etcd T233973
  • 05:38 vgutierrez: Switch cp3044 from nginx to ats-tls - T231433
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9399 and previous config saved to /var/cache/conftool/dbconfig/20191021-053737-marostegui.json
  • 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui: Compress tables on db2084:3314 db2091:3312 - T235599
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P9398 and previous config saved to /var/cache/conftool/dbconfig/20191021-052643-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312 db2084:3315 - T235599', diff saved to https://phabricator.wikimedia.org/P9397 and previous config saved to /var/cache/conftool/dbconfig/20191021-052527-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9396 and previous config saved to /var/cache/conftool/dbconfig/20191021-052035-marostegui.json
  • 05:19 vgutierrez: Switch cp4026 from nginx to ats-tls - T231433
  • 05:14 marostegui: Deploy schema change on db1090:3312 T234066 T233135
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312 for schema change and pool db1129 temporarily in vslow, dump', diff saved to https://phabricator.wikimedia.org/P9395 and previous config saved to /var/cache/conftool/dbconfig/20191021-051356-marostegui.json
  • 05:09 marostegui: Deploy schema change on s7 primary master db1062 - T234066 T233135
  • 04:57 vgutierrez: Switch cp5006 from nginx to ats-tls - T231433

2019-10-19

  • 08:41 XioNoX: add user papaul to fasw-c-eqiad
  • 00:05 mutante: LDAP - adding verenali to wmde and nda groups, to match raja_wmde (T233807, T231677)

2019-10-18

  • 22:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet,service=parsoid-php
  • 22:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet,service=parsoid-php
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet,service=parsoid-php
  • 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet,service=parsoid-php
  • 22:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet,service=parsoid-php
  • 22:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet,service=parsoid-php
  • 22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet,service=parsoid-php
  • 22:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet,service=parsoid-php
  • 22:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2032.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet,service=parsoid-php
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet,service=parsoid-php
  • 21:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet,service=parsoid-php
  • 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet,service=parsoid-php
  • 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet,service=parsoid-php
  • 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet,service=parsoid-php
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet,service=parsoid-php
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet,service=parsoid-php
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet,service=parsoid-php
  • 20:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet,service=parsoid-php
  • 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet,service=parsoid-php
  • 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet,service=parsoid-php
  • 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 19:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 18:27 mutante: temp. disabled puppet on all wtp* servers, adding mediawiki appserver roles on them incrementally by re-enabling puppet, starting with wtp1026, scheduled icinga downtime for wtp* all services (T233654)
  • 18:19 mutante: temp. disabling puppet on all wtp* servers
  • 15:40 Urbanecm: Reassign edits from DannyS712 (T235446) to DannyS712 at banwiki (T235446)
  • 15:38 Urbanecm: Run extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=banwiki DannyS712 (T235446)
  • 15:38 Urbanecm: Rename DannyS712@banwiki to DannyS712 (T235446) locally (T235446)
  • 15:07 Urbanecm: Reattach DannyS712@banwiki to DannyS712@SUL (T235446)
  • 14:19 _joe_: uploading cassandra 3.11.4 to stretch-wikimedia
  • 14:10 marostegui: Run compare.py on db1105 - T235877
  • 13:48 jynus: disabled notifications on db1105
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and db1105:3312 host rebooted itself', diff saved to https://phabricator.wikimedia.org/P9392 and previous config saved to /var/cache/conftool/dbconfig/20191018-134517-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2059 from config, host decommissioned', diff saved to https://phabricator.wikimedia.org/P9391 and previous config saved to /var/cache/conftool/dbconfig/20191018-132934-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3315 for tables compression T235599', diff saved to https://phabricator.wikimedia.org/P9390 and previous config saved to /var/cache/conftool/dbconfig/20191018-130253-marostegui.json
  • 13:01 marostegui: Compress db2084:3315 T235599
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P9389 and previous config saved to /var/cache/conftool/dbconfig/20191018-123930-marostegui.json
  • 12:20 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:10 jbond42: !log disable puppet on puppetmasters to fix puppet-merge
  • 11:58 moritzm: installing sudo security updates for jessie
  • 11:56 Reedy: `mwscript refreshLinks.php banwiki` on mwmaint1002 T235843
  • 11:10 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4028.ulsfo.wmnet,service=ats-be
  • 10:56 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet - T234175
  • 10:53 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet
  • 10:49 effie: Uploading wikidiff2_1.9.0-2~wmf1 to stretch-wikimedia T231586
  • 09:58 moritzm: rolling out debdeploy 0.0.99.12 fleet-wide
  • 09:57 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=echostore
  • 09:40 _joe_: restarting pybal on lvs1015 to pick up the addition of echostore
  • 09:37 ema: pool cp4028 with ATS backend T227432
  • 09:36 _joe_: restarting pybal on lvs2003 to pick up the addition of echostore
  • 09:34 _joe_: restarting pybal on lvs1016 to pick up the addition of echostore
  • 09:20 _joe_: restarting pybal on lvs2006 to pick up the addition of echostore
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=echostore
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 moritzm: importing debdeploy 0.0.99.12 to apt.wikimedia.org
  • 09:13 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:11 _joe_: hotpatching puppet-merge on puppetmaster1001
  • 08:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:32 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:03 ema: depool cp4028 and reimage as text_ats T227432
  • 07:58 marostegui: Deploy schema change on db1076
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P9388 and previous config saved to /var/cache/conftool/dbconfig/20191018-075709-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P9387 and previous config saved to /var/cache/conftool/dbconfig/20191018-075529-marostegui.json
  • 07:21 moritzm: installing unbound security updates on buster
  • 07:20 moritzm: installing libdatetime-timezone-perl updates (time zone updates)#
  • 05:53 vgutierrez: switch cp1084 from nginx to ats-tls - T231433
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:32 vgutierrez: switch cp2014 from nginx to ats-tls - T231433
  • 05:19 marostegui: Rename m5 labtestwiki database - T233236
  • 05:15 marostegui: Deploy schema change on db1129 T233135 T234066
  • 05:15 marostegui: Compress tables on db2091:3314 T235599
  • 05:14 vgutierrez: switch cp3039 from nginx to ats-tls - T231433
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P9386 and previous config saved to /var/cache/conftool/dbconfig/20191018-051355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 and db2086:3318 after table compression', diff saved to https://phabricator.wikimedia.org/P9385 and previous config saved to /var/cache/conftool/dbconfig/20191018-050831-marostegui.json
  • 04:57 vgutierrez: switch cp4025 from nginx to ats-tls - T231433
  • 04:34 vgutierrez: switch cp5005 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: restarting nagios-nrpe-server on stat1007

2019-10-17

  • 21:42 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673 (duration: 05m 38s)
  • 21:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673
  • 19:31 eileen: civicrm revision changed from 4eac801762 to ff69d64ad4, config revision is dc3a88889d
  • 18:26 mutante: wtp1025 - cd /srv/deployment/parsoid/deploy/src ; sudo -u deploy-service ln -s ../vendor (for benchmarking test)
  • 18:01 _joe_: depooled wtp1025 from parsoid, parsoid-php to allow running benchmarks there
  • 18:01 elukey: update librdkafka on eventlog1002 and restart eventlogging
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 and remove db1136 from its temporary vslow,dump role', diff saved to https://phabricator.wikimedia.org/P9382 and previous config saved to /var/cache/conftool/dbconfig/20191017-151952-marostegui.json
  • 15:07 dcausse: unbanning elastic1050:psi
  • 15:01 dcausse: dumping jvm heap on elastic1050:psi to investigate gc issues
  • 14:46 moritzm: installing 4.9.189 Linux update on jessie hosts (no reboots, deploying the package only at this point)
  • 14:37 dcausse: banning elastic1050:psi to investigate gc issues
  • 14:32 moritzm: uploaded linux-meta 1.22 for jessie-wikimedia
  • 14:32 bblack: disable puppet on cache fleet (cp*) ahead of cert deployment refactoring - T234803
  • 14:09 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro --restrict grafana update buster-wikimedia
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9381 and previous config saved to /var/cache/conftool/dbconfig/20191017-134112-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9380 and previous config saved to /var/cache/conftool/dbconfig/20191017-133047-marostegui.json
  • 13:06 XioNoX: rollback failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 12:56 XioNoX: restart mr1-eqiad
  • 12:54 XioNoX: downtiming all mgmt host for 30min (mr1-eqiad needs to be rebooted)
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9379 and previous config saved to /var/cache/conftool/dbconfig/20191017-125248-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9378 and previous config saved to /var/cache/conftool/dbconfig/20191017-125154-marostegui.json
  • 12:50 marostegui: Compress tables on db2088:3312 - T235599
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9377 and previous config saved to /var/cache/conftool/dbconfig/20191017-124503-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1090:3312 original weight', diff saved to https://phabricator.wikimedia.org/P9376 and previous config saved to /var/cache/conftool/dbconfig/20191017-121330-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9375 and previous config saved to /var/cache/conftool/dbconfig/20191017-121106-marostegui.json
  • 11:39 ema: pool cp4027 with ATS backend T227432
  • 11:36 vgutierrez: upgrading ATS on eqiad nodes to 8.0.5-1wm9 - T234011
  • 11:27 vgutierrez: upgrading ATS on codfw nodes to 8.0.5-1wm9 - T234011
  • 11:27 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4027.ulsfo.wmnet,service=ats-be
  • 11:16 vgutierrez: upgrading ATS on esams nodes to 8.0.5-1wm9 - T234011
  • 11:11 Urbanecm: EU SWAT done
  • 11:11 XioNoX: failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 36d4612: Allow sysops to add transwiki on nnwiki, and add import sources (T231761) (duration: 00m 59s)
  • 11:09 vgutierrez: upgrading ATS on ulsfo nodes to 8.0.5-1wm9 - T234011
  • 11:08 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikibaseMediaInfo: SWAT: 5a67011: Keep track of assigned nodes in both old & new DOM (T235236) (duration: 01m 03s)
  • 10:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 ema: depool cp4027 and reimage as text_ats T227432
  • 10:31 effie: depool mw1333
  • 10:25 elukey: rollback eventlogging back to Python 2, some errors (unseen in tests) logged by the processors
  • 10:24 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3 (duration: 00m 03s)
  • 10:24 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3
  • 10:19 elukey: Move eventlogging on eventlog1002 to Python3
  • 10:17 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3 (duration: 00m 05s)
  • 10:17 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3
  • 09:57 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 09:39 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:38 marostegui: Stop MySQL on db1129 for PDU work
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for PDU work, give some traffic to db1090:3312 meanwhile T22meanwhile T227133', diff saved to https://phabricator.wikimedia.org/P9374 and previous config saved to /var/cache/conftool/dbconfig/20191017-093753-marostegui.json
  • 09:27 elukey: upload archiva 2.2.4-1 to stretch-wikimedia - T222595
  • 09:26 marostegui: Stop MySQL on db1117 this will generate some haproxy alerts - T227133
  • 08:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:26 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:05 vgutierrez: upgrading ATS on eqsin nodes to 8.0.5-1wm9 - T234011
  • 08:03 marostegui: Deploy schema change on db1090:3317
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db1136 weight', diff saved to https://phabricator.wikimedia.org/P9373 and previous config saved to /var/cache/conftool/dbconfig/20191017-080157-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 pool db1136 temporarily into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9372 and previous config saved to /var/cache/conftool/dbconfig/20191017-080026-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P9371 and previous config saved to /var/cache/conftool/dbconfig/20191017-074658-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 (non partitioned host) into s5 special group with low weight - T223151', diff saved to https://phabricator.wikimedia.org/P9370 and previous config saved to /var/cache/conftool/dbconfig/20191017-071308-marostegui.json
  • 06:06 elukey: upgrade archiva on archiva1001 to 2.2.4 - T222595
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from x to x100 on s5 - T231018', diff saved to https://phabricator.wikimedia.org/P9369 and previous config saved to /var/cache/conftool/dbconfig/20191017-060251-marostegui.json
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui: Deploy schema change on labtestwiki and labswiki
  • 05:12 marostegui: Deploy schema change on db1095:3312
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 and db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P9368 and previous config saved to /var/cache/conftool/dbconfig/20191017-051055-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 and db1094', diff saved to https://phabricator.wikimedia.org/P9367 and previous config saved to /var/cache/conftool/dbconfig/20191017-050614-marostegui.json
  • 05:01 vgutierrez: upgrading ATS to 8.0.5-1wm9 on cp5001 - T234011
  • 05:00 vgutierrez: uploaded trafficserver 8.0.5-1wm9 to apt.wikimedia.org (stretch) - T234011
  • 02:04 bblack: repooling eqsin
  • 00:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2019-10-16

  • 23:17 Urbanecm: Evening SWAT done
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: Clean expired rules (duration: 00m 58s)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-1.5x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-2x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki.png (T235710)
  • 23:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 9c5bcd8: Change logo for azwiki (T235710) (duration: 00m 59s)
  • 23:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6dc4c0c: New throttle rule for WMCL editathon (T235693) (duration: 00m 59s)
  • 23:09 @: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96c87c7: Enable transwiki import from other Wikipedias on srwikisource (T235419) (duration: 00m 58s)
  • 23:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:00 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 22:42 James_F: Zuul: Add composer-php72-docker for wikimedia-cz/web-theme and wikimedia-cz/web-plugin
  • 22:31 mutante: mwmaint1002 - running generate-fancy-captcha-loop to work around issue with generate-captcha cron (T230245)
  • 22:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:29 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/OutputPage.php: T235711 Lower severity of targets violation back to DEBUG (duration: 00m 59s)
  • 21:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikiEditor: T235701 Revert removal of jquery.tabIndex (duration: 00m 59s)
  • 21:47 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:44 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:42 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:41 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 21:10 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 20:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 20:41 ejegg: rolled back fundraising python tools from 31171f148c to b3c7453be2
  • 20:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/resourceloader/ResourceLoaderStartUpModule.php: Expose StartupModule::getConfigSettings for internal use T235350 T229836 (duration: 00m 59s)
  • 20:07 joal@deploy1001: Finished deploy [analytics/refinery@1704fdd]: Regular analytics weekly train (duration: 17m 06s)
  • 20:00 urandom: upgrading Cassandra to 3.11.4, codfw, rack d -- T200803
  • 19:50 joal@deploy1001: Started deploy [analytics/refinery@1704fdd]: Regular analytics weekly train
  • 19:35 urandom: upgrading Cassandra to 3.11.4, codfw, rack c -- T200803
  • 19:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.34.0-wmf.25 (duration: 03m 24s)
  • 19:18 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix (duration: 05m 53s)
  • 19:13 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix
  • 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.2 refs T233850 (duration: 00m 59s)
  • 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.2 refs T233850
  • 19:06 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint) (duration: 01m 18s)
  • 19:05 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint)
  • 18:46 urandom: upgrading Cassandra to 3.11.4, codfw, rack b -- T200803
  • 18:28 urandom: upgrading Cassandra to 3.11.4, eqiad, rack d -- T200803
  • 18:06 urandom: upgrading Cassandra to 3.11.4, eqiad, rack b -- T200803
  • 16:33 urandom: upgrading Cassandra to 3.11.4, eqiad, rack a -- T200803
  • 16:17 catrope@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/GrowthExperiments/: Fix help panel button alignment (T235578) (duration: 01m 02s)
  • 16:16 mutante: ganeti1003 - shutting down and removing instance moscovium.eqiad.wmnet - recreating under same name with cookbook
  • 15:59 mutante: new dsh group parsoid_php created - parsoid-php servers added to scap / mediawiki-installation dsh group
  • 15:17 marostegui: Deploy schema change on dbstore1004:3312 - T234066 T233135
  • 15:09 marostegui: Recreate views for protected_titles on s2 and s7 on labsdb1009 and labsdb1012 - T233135
  • 15:04 mutante: wtp1025 wtp2001 - scap pull (T233654)
  • 15:04 mutante: wtp parsoid servers added to conftool - wtp1025 and wtp2001 pooled in new service parsoid-php (T233654)
  • 15:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 14:53 effie: Remove tex* and math related packages from deploy*,mwmaint*,snapshot* - T195847
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:26 papaul: power down puppetmaster2001 for HW maintenance
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:24 _joe_: creating namespaces and policies for echostore in codfw, T234376
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:10 moritzm: installing idp2001
  • 13:56 jynus: reenabling puppet on helium T229209
  • 13:46 XioNoX: rollback failover VRRP from cr1-eqiad to cr2-eqiad - T226782
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 and db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P9364 and previous config saved to /var/cache/conftool/dbconfig/20191016-132620-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P9363 and previous config saved to /var/cache/conftool/dbconfig/20191016-131010-marostegui.json
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P9362 and previous config saved to /var/cache/conftool/dbconfig/20191016-125102-marostegui.json
  • 12:38 effie: remove tex* and math related packages from appserver canaries - T195847
  • 12:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540 (duration: 03m 40s)
  • 12:29 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:26 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540
  • 12:20 marostegui: Compress tables on db1099:3311 - T235599
  • 12:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c90503b]: Revert to fix T235540 (duration: 19m 09s)
  • 12:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:00 kart_: Updated cxserver to 2019-10-15-091114-production (T234773, T217585)
  • 11:57 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:56 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c90503b]: Revert to fix T235540
  • 11:49 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT (duration: 10m 13s)
  • 11:46 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:39 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT
  • 11:34 Lucas_WMDE: EU SWAT done
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: extension-list: Load FlaggedRevs via extension.json (T87915, T139800, T140852) (duration: 01m 05s)
  • 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure Citoid+Wikibase integration on Test Wikidata (T228412) (duration: 01m 13s)
  • 11:14 _joe_: purging confd from wtp* servers, not needed anymore
  • 10:48 _joe_: upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run
  • 10:31 elukey: upload prometheus-memcached-exporter 0.4.1+git20181010.2fa99eb-1+deb10u1 to buster-wikimedia - T213089
  • 10:17 marostegui: Stop replication on s2 codfw master for schema change and to modify sanitarium triggers T234066 T233135 T234704
  • 09:40 effie: enable puppet on all hosts running hhvm - T229792
  • 09:36 XioNoX: restart fastnetmon on netflow2001
  • 09:27 effie: Disable puppet on all hosts running hhvm to merge 543131 - T229792
  • 09:22 effie: Disable puppet on mw* hosts to merge 543131
  • 09:20 gehel: force merging commonswiki_content on elasticsearch codfw
  • 08:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:15 _joe_: upgrading envoyproxy in production to 1.11.2 T235412
  • 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9360 and previous config saved to /var/cache/conftool/dbconfig/20191016-052104-marostegui.json
  • 05:18 marostegui: Deploy schema change on s2 sanitarium master (db1074) this will create lag on s2 labsdb T233135 T234066
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P9359 and previous config saved to /var/cache/conftool/dbconfig/20191016-051812-marostegui.json
  • 05:14 marostegui: Change s7 triggers for archive table from db1125:3317 T234704
  • 05:11 marostegui: Change s2 triggers for archive table from db1125:3312 T234704
  • 05:08 marostegui: Deploy schema change on s7 sanitarium master (db1079) this will create lag on s7 labsdb T233135 T234066
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P9358 and previous config saved to /var/cache/conftool/dbconfig/20191016-050627-marostegui.json
  • 03:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465 (duration: 13m 37s)
  • 03:35 mobrovac@deploy1001: Started deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465
  • 01:55 eileen: civicrm revision changed from 5a2f8048c4 to 4eac801762, config revision is dc3a88889d
  • 00:09 mutante: wikitech - make JBond a "content administrator" to give the ability to create server fingerprint pages

2019-10-15

  • 22:41 Reedy: manually running `extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php` T230245
  • 21:26 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Provide getCachableMWConfig() which doesn't rely on wgConf (duration: 01m 00s)
  • 21:24 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408) (duration: 05m 35s)
  • 21:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408)
  • 21:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings: Stop writing wmgScoreFileBackend and wmgScorePath, never read (duration: 00m 59s)
  • 21:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Stop using wmg variables for Score extension (duration: 01m 01s)
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write wgScoreFileBackend and wgScorePath directly, not via CommonSettings (duration: 01m 00s)
  • 20:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.2 refs T233850
  • {{safesubst:SAL entry|1=19:55 urandom: upgrade restbase2011-{a,b,c} to cassandra 3.11.-4 -- T200803}}
  • 19:52 urandom: upgrade restbase1016-c to cassandra 3.11.-4 -- T200803
  • 19:48 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.2 refs T233850 (duration: 27m 39s)
  • 19:48 urandom: upgrade restbase1016-b to cassandra 3.11.-4 -- T200803
  • 19:42 urandom: upgrade restbase1016-a to cassandra 3.11.-4 -- T200803
  • 19:20 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.2 refs T233850
  • 19:07 mutante: LDAP - adding user rzl to groups wmf and ops (T235215)
  • 17:51 longma: cutting the branch for 1.35.0-wmf.2 T233850
  • 16:28 ejegg: updated payments-wiki from c3cc3ace2f to 570324a30f
  • 16:24 papaul: power down lvs2010 for HW maintenance
  • 16:00 _joe_: uploading envoy 1.11.2 to stretch-wikimedia, buster-wikimedia T230779 T235412
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9355 and previous config saved to /var/cache/conftool/dbconfig/20191015-155454-marostegui.json
  • 15:52 papaul: power down lvs2009 for HW maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9354 and previous config saved to /var/cache/conftool/dbconfig/20191015-154325-marostegui.json
  • 15:17 ejegg: updated payments-wiki from 8a65f57874 to c3cc3ace2f
  • 15:01 moritzm: installing fribidi bugfix updates from stretch point release
  • 14:54 moritzm: installing cups security updates for stretch (client-side libs/tools only)
  • 14:43 elukey: start a root tmux containing a bash script on conf1004 to clean up znodes under /yarn-rmstore/analytics-hadoop/ZKRMStateRoot/RMAppRoot slowly - T217057
  • 14:40 papaul: power down puppetmaster2002 for HW maintenance
  • 14:38 moritzm: installing usbutils update from stretch point release
  • 14:34 elukey: executed 'rmr' in zookeeper on conf1004 for znodes /yarn-leader-election /hadoop-ha /hive_zookeeper_namespace
  • 14:12 ejegg: updated fundraising python tools from b3c7453be2 to 31171f148c
  • 13:53 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9353 and previous config saved to /var/cache/conftool/dbconfig/20191015-130356-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9352 and previous config saved to /var/cache/conftool/dbconfig/20191015-124942-marostegui.json
  • 12:46 elukey: Hadoop maintenance over
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9351 and previous config saved to /var/cache/conftool/dbconfig/20191015-123356-marostegui.json
  • 12:24 mobrovac: restbase add parsoidphp tables in prod - T230792
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9350 and previous config saved to /var/cache/conftool/dbconfig/20191015-121840-marostegui.json
  • 12:17 marostegui: Repool labsdb1009 after PDU maintenance
  • 12:17 elukey: Hadoop maintenance start - migration to the new Zookepeer cluster
  • 12:16 moritzm: installing sudo security updates on buster/stretch
  • 12:13 arturo: add copy of python-pykube and python3-pykube from stretch-wikimedia to buster-wikimedia (T230961)
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 hashar: CI Jenkins restarted
  • 12:04 hashar: Restarting CI Jenkins
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9348 and previous config saved to /var/cache/conftool/dbconfig/20191015-120359-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P9347 and previous config saved to /var/cache/conftool/dbconfig/20191015-120133-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9346 and previous config saved to /var/cache/conftool/dbconfig/20191015-115922-marostegui.json
  • 11:12 Urbanecm: EU SWAT done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ac37540: Add `autopatrol` to translation administrators on mediawiki (duration: 00m 51s)
  • 11:12 jbond42: move puppetmaster_ca_server back to puppetmaster1001
  • 11:08 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip 195.113.145.2 (T235493)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT:855aca4eb: Throttle rule for Czech course (T235493) (duration: 00m 51s)
  • 10:54 moritzm: mark ruby-safe-yaml as manually installed using apt-mark on jessie/stretch, prevents accidental removal of ruby-safe-yaml after puppet 4->5 migration
  • 10:07 moritzm: installing openssl updates for buster (some ciphers we don't use were not enabled due to an upstream change related to the selection of ASM-optimised implementations over generic C)
  • 08:07 marostegui: Stop MySQL on db1126 and labsdb1009 for PDU maintenance - T226782
  • 08:06 elukey: upload new version of memkeys (adding a patch to merged to upstream to avoid segfaults on stretch/buster) to stretch|buster wikimedia apt repos - T223863
  • 07:52 Urbanecm: Set email for `Martin Urbanec (test 10)` to test@wikimedia.cz (debug, no ticket)
  • 07:48 Urbanecm: Password reset for Xaris333 #2 (T235441)
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for PDU maintenance T226782', diff saved to https://phabricator.wikimedia.org/P9345 and previous config saved to /var/cache/conftool/dbconfig/20191015-071338-marostegui.json
  • 07:10 XioNoX: failover VRRP from cr1-eqiad to cr2-eqiad in prevision of the PDU work of - T226782
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 T232446', diff saved to https://phabricator.wikimedia.org/P9344 and previous config saved to /var/cache/conftool/dbconfig/20191015-064419-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1070 T235464', diff saved to https://phabricator.wikimedia.org/P9343 and previous config saved to /var/cache/conftool/dbconfig/20191015-064005-marostegui.json
  • 05:38 marostegui: Depool labsdb1009 for PDU maintenance T226782
  • 05:28 marostegui: Deploy schema change on db1098:3317 T234066 T233135
  • 05:28 marostegui: Deploy schema change on db1097:3314 T233625
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9342 and previous config saved to /var/cache/conftool/dbconfig/20191015-052621-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9341 and previous config saved to /var/cache/conftool/dbconfig/20191015-052220-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P9340 and previous config saved to /var/cache/conftool/dbconfig/20191015-051924-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P9339 and previous config saved to /var/cache/conftool/dbconfig/20191015-051400-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P9338 and previous config saved to /var/cache/conftool/dbconfig/20191015-051236-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1100 to s5 master and remove read-only from s5 T234300', diff saved to https://phabricator.wikimedia.org/P9337 and previous config saved to /var/cache/conftool/dbconfig/20191015-050042-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s5 as read-only for maintenance T234300', diff saved to https://phabricator.wikimedia.org/P9336 and previous config saved to /var/cache/conftool/dbconfig/20191015-050016-marostegui.json
  • 05:00 marostegui: Starting s5 failover from db1070 to db1100 - T234300
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P9335 and previous config saved to /var/cache/conftool/dbconfig/20191015-043403-marostegui.json
  • 04:15 marostegui: Start pre-switchover steps T234300

2019-10-14

  • 23:27 Krinkle: Delete 2019-09-01––2019-09-10 arclamp trace logs from webperf1002, and decompress the rest of 2019-09 (this will trigger svg re-generation), T235425
  • 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 86f12b6e (duration: 00m 51s)
  • 21:47 Krinkle: Deleting 2019-09-01––2019-09-10 arclamp logs on webperf2002, and decompress the rest of 2019-09, T235425
  • 21:12 Krinkle: Delete misc arclamp/logs and arclamp/svgs data from between 2018 and and 2019-08 on webperf1002/webperf2002, T235425
  • 20:41 maxsem@deploy1001: Synchronized php-1.35.0-wmf.1/includes/: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542963/ (duration: 00m 55s)
  • 17:56 mutante: webperf2002 - /srv/xenon/logs/daily# gzip 2019-09*excimer*.log (T235425)
  • 17:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates (duration: 16m 45s)
  • 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates
  • 16:07 moritzm: imported cergen 0.2.4-1+deb10u3 to component/cergen for buster-wikimedia T235405
  • 16:00 Urbanecm: Password reset for Xaris333 (T235441)
  • 15:57 moritzm: imported cergen 0.2.4-1+deb10u2 to component/cergen for buster-wikimedia T235405
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9329 and previous config saved to /var/cache/conftool/dbconfig/20191014-142843-marostegui.json
  • 14:28 elukey: upload matomo 3.11 to stretch-wikimedia and upgrade matomo1001 - T234607
  • 14:21 marostegui: Deploy schema change on db1116:3317 T234066 T233135
  • 14:13 effie: Enable puppet on mw* servers and reload apache - T229792
  • 13:48 moritzm: imported cergen 0.2.4-1+deb10u1 to component/cergen for buster-wikimedia T235405
  • 13:42 marostegui: Repool labsdb1009 after PSU replacement - T233273
  • 13:36 effie: Slowly enable puppet on mw* canaries
  • 13:26 moritzm: imported python-networkx 1.11-2~wmf1 to component/cergen for buster-wikimedia T235405
  • 13:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:19 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:18 effie: Disable puppet on mw* to remove php72_only feature flag - T229792
  • 13:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 245b4e5: Add banwiki logo to IS.php (T234768) (duration: 00m 51s)
  • 13:12 Urbanecm: Run git reset --hard origin/master in /srv/mediawiki-stagging (deleted https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542920 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542919 from deployment srv, both don't actually change anything => safe to delete) (T234768)
  • 13:10 marostegui: Sanitize banwiki on db1124:3313 and db2094:3313 T234770
  • 12:44 Amir1: Creating banwiki is banned (done)
  • 12:40 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
  • 12:34 ladsgroup@deploy1001: Synchronized langlist: Creating banwiki: T234768 (duration: 00m 50s)
  • 12:32 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating banwiki: T234768
  • 12:20 ladsgroup@deploy1001: Synchronized dblists: Creating banwiki: T234768 (duration: 00m 52s)
  • 12:10 tarrow@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/Wikibase: SWAT: Bump up Termbox cache version (T235192) (duration: 00m 56s)
  • 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reftabs on testwikidata (T199197, T228412) (duration: 00m 51s)
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a295cc7: Fix wrong domain in wgCopyUploadDomains added in T203363 (T235415) (duration: 00m 51s)
  • 11:27 kart_: Update cxserver to 2019-10-03-054958-production (T232986)
  • 11:22 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:17 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 538867|Use ContentTranslationEnableMT to disable MT (T232986) (duration: 00m 51s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9326 and previous config saved to /var/cache/conftool/dbconfig/20191014-100758-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 into s5 api, db1100 will be removed later in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9325 and previous config saved to /var/cache/conftool/dbconfig/20191014-094809-marostegui.json
  • 09:34 hashar: Upgraded CI jobs to Quibble 0.0.38
  • 09:14 marostegui: Deploy schema change on dbstore1003:3317
  • 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:55 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:52 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 and db2126 after changing sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9322 and previous config saved to /var/cache/conftool/dbconfig/20191014-085143-marostegui.json
  • 08:46 mobrovac: restbase drop metadata keyspaces from cassandra - T235173
  • 07:54 marostegui: Stop db1074 and db2126 in sync to change sanitarium's master for s2 - T231638
  • 07:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata (duration: 03m 58s)
  • 07:45 mobrovac@deploy1001: Started deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata
  • 07:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173 (duration: 13m 37s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db2126 to change sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9320 and previous config saved to /var/cache/conftool/dbconfig/20191014-073319-marostegui.json
  • 07:28 mobrovac@deploy1001: Started deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173
  • 07:28 mobrovac@deploy1001: Finished deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173 (duration: 01m 25s)
  • 07:26 mobrovac@deploy1001: Started deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json
  • 07:16 marostegui: Stop MySQL on labsdb1009 for on-site maintenance - T233273
  • 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)
  • 05:47 marostegui: Remove db2068 from tendril and zarcillo T235399
  • 04:56 marostegui: Depool labsdb1009 for on-site maintenance - T233273
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9318 and previous config saved to /var/cache/conftool/dbconfig/20191014-045629-marostegui.json

2019-10-13

  • 00:52 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ec77b1b (duration: 00m 55s)

2019-10-12

  • 23:21 krinkle@deploy1001: Synchronized wmf-config/profiler.php: bfa8bb69c1f, T231564 (duration: 00m 51s)
  • 21:07 krinkle@deploy1001: Synchronized php-1.35.0-wmf.1/includes/resourceloader/ResourceLoaderStartUpModule.php: 8c6baeae2 (duration: 00m 53s)
  • 20:57 Urbanecm: Reset user email of User:Gardini (T235318)
  • 18:38 _joe_: deleting zotero pods with excessive memory usage in eqiad
  • 16:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: T235334 (duration: 00m 51s)
  • 16:15 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBacklinksprop.php: T235334 (duration: 00m 56s)
  • 04:37 krinkle@deploy1001: Synchronized wmf-config/profiler.php: 29d8469 (duration: 00m 57s)

2019-10-11

  • 15:39 AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de
  • 13:57 moritzm: rebooting cloudbackup2001
  • 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - T235150
  • 12:47 XioNoX: disable SIP ALG on pfw3-codfw - T235150
  • 12:45 moritzm: installing libxslt security updates
  • 12:35 moritzm: installin zsh updates from stretch point release
  • 12:33 moritzm: installing gsoap security updates on stretch
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
  • 12:31 moritzm: installing libcaca security updates on stretch
  • 12:25 XioNoX: push firewall policies to pfw3-eqiad - T235074
  • 12:24 XioNoX: push firewall policies to pfw3-codfw - T235074
  • 11:51 moritzm: installing unzip security updates on stretch
  • 11:08 moritzm: upgrading debdeploy to 0.0.99.11
  • 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
  • 10:11 hashar: Restarting Gerrit # T224448
  • 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
  • 08:34 moritzm: remove kafka2001-2003 from debmonitor DB (T235125)
  • 08:32 moritzm: remove kafka1001-1003 from debmonitor DB (T235125)
  • 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
  • 07:32 XioNoX: rollback two previous HE peering deactivate
  • 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
  • 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
  • 06:13 marostegui: Compress tables on db2085:3318 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
  • 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
  • 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
  • 02:14 mutante: gerrit - "manually" starting replication via ssh command
  • 02:13 mutante: gerrit - restart service to ensure last config change is picked up
  • 02:10 mutante: gerrit1001 - attempt to manually start replication to github

2019-10-10

  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Update cron-updated miser pages to say they are run periodically, not never (duration: 00m 51s)
  • 22:10 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Remove debug line dating from 2015-12-08! (duration: 00m 51s)
  • 22:04 jforrester@deploy1001: Synchronized wmf-config/mc.php: Drop nutcracker indirection for HHVM servers, just point to localhost (duration: 00m 51s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Drop special-case for PHP7, now always used (duration: 00m 51s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop HHVM special-case for SVG converter, no longer used (duration: 00m 51s)
  • 21:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't check to shard static config cache for HHVM any more (duration: 00m 50s)
  • 21:48 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Don't check to shard wmgWBSharedCacheKey for HHVM any more (duration: 00m 51s)
  • 21:39 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/lib/ve/src/dm/ve.dm.TreeCursor.js: T234881 TreeCursor: cross ignored nodes properly from the end of a text node (duration: 00m 54s)
  • 20:36 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004 (duration: 00m 06s)
  • 20:36 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004
  • 20:13 hoo: Updated the Wikidata property suggester with data from the 2019-09-30 JSON dump and applied the T132839 workarounds
  • 19:33 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 19:29 marxarelli: promoted 1.35.0-wmf.1 to all wikis. no rise in errors rates. no new relevant errors cc: T233849
  • 19:25 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.1
  • 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1
  • 19:09 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s)
  • 19:04 marxarelli: promoting labswiki to 1.35.0-wmf.1 cc: T233849
  • 17:07 jbond42: puppetmaster1001 has been upgraded and is back serving requests
  • 16:21 urandom: Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803
  • 16:18 urandom: Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:16 urandom: Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:11 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:07 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:04 thcipriani: restarting gerrit due to T224448
  • 16:04 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:01 urandom: Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 15:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s)
  • 15:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json
  • 14:54 moritzm: ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage)
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json
  • 14:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 14:03 jbond42: re-enable puppet now ca has been correctly moved
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json
  • 13:50 jbond42: disable puppet fleet wide as puppetmaster2002 is stuggeling
  • 13:32 jbond42: reimage puppetmaster1001
  • 13:27 marostegui: Repool labsdb1011 after reclone - T235016
  • 13:16 arturo: added flannel 0.5.5-4 to buster-wikimedia (T235059)
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s)
  • 13:00 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 12:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 11:57 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:57 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:48 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:46 jbond@cumin2001: Updating IPMI password on 35 hosts - jbond@cumin2001
  • 11:46 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Fix typo in beta repo data bridge config (T235033) (duration: 00m 59s)
  • 11:40 marostegui: Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135
  • 11:38 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:38 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:38 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:37 arturo: icinga downtime cloudvirt1023 for 2h (T227536)
  • 11:36 arturo: icinga downtime cloudvirt1025 for 2h (T227536)
  • 11:36 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:36 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:36 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:35 arturo: icinga downtime cloudvirt1026 for 2h (T227536)
  • 11:35 marostegui: Stop replication on db2077 to change triggers on db2095:3317 - T234704
  • 11:23 moritzm: installing reportbug updates from stretch point release
  • 11:22 Lucas_WMDE: EU SWAT done
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Set dataBridgeEnabled repo setting on beta (T235033) (affects InitialiseSettings-labs.php and Wikibase.php, but Wikibase.php part is guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:14 Lucas_WMDE: ^ (and by CS, I actually mean Wikibase.php, not CommonSettings.php, sorry)
  • 11:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Rename data bridge config variable names (T235033) (affects IS-labs and CS, but the CS part is all guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 10:38 moritzm: rebalancing Ganeti eqiad/row C after rolling reboots of Ganeti nodes
  • 10:34 volans: uploaded spicerack_0.0.28-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 08:23 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:12 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP - T233654 (duration: 01m 01s)
  • 07:55 marostegui: Stop MySQL on es1014 es1013 db1084 db1083 db1077 db1076 db1112 db1124 db1118 for on-site PDU maintenance (this will generate lag on labsdb hosts) - T227536
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:56 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Drop designate_pool_manager database from m5 - T233978
  • 06:33 marostegui: Revoke privileges from designate user on the designate_pool_manager database - T233978
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for PDU maintenance T227536', diff saved to https://phabricator.wikimedia.org/P9294 and previous config saved to /var/cache/conftool/dbconfig/20191010-055153-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1078 into rc service for s3 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9293 and previous config saved to /var/cache/conftool/dbconfig/20191010-055102-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 db1083 db1076 db1118 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9292 and previous config saved to /var/cache/conftool/dbconfig/20191010-054853-marostegui.json
  • 05:47 marostegui: Depool db1084 db1083 db1076 db1118 for PDU maintenance - T227536
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 marostegui: Deploy schema change on db1061 (s6 eqiad master) - T233135 T234066
  • 04:43 marostegui: Depool labsdb1011 for recloning - T235016
  • 00:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 00:39 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 00:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 00:38 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset

2019-10-09

  • 23:55 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 03m 57s)
  • 23:51 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: (no justification provided)
  • 23:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable AMC on all wikis (T233612) (duration: 00m 58s)
  • 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Turn on AMC outreach modal (T234026) (duration: 00m 59s)
  • 22:01 mutante: restarting gerrit to revert replication config change (T235135)
  • 21:27 godog: swift eqiad-prod: add ms-be105[1-6] - T232367
  • 21:02 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: (no justification provided) (duration: 00m 02s)
  • 21:02 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 21:02 otto@deploy1001: deploy aborted: (no justification provided) (duration: 38m 29s)
  • 20:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006 (duration: 01m 44s)
  • 20:53 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006
  • 20:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds (duration: 02m 42s)
  • 20:41 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds
  • 20:31 papaul: rebooting ms-be1051 to access BIOS
  • 20:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e (duration: 06m 22s)
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 20:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 00m 10s)
  • 20:16 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 05m 34s)
  • 20:10 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:09 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 02m 23s)
  • 20:06 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:56 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 00m 12s)
  • 19:54 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:54 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:52 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 08m 00s)
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:44 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 09m 33s)
  • 19:34 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:25 marxarelli: 1.35.0-wmf.1 promoted to group1, labswiki rolled back to 1.34.0-wmf.25 and to be kept back, cc: T233849
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki rollback to 1.34.0-wmf.25 due to hhvm
  • {{safesubst:SAL entry|1=19:09 urandom: Upgrade restbase-dev1006-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 19:09 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.1 (duration: 00m 58s)
  • 19:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.1
  • {{safesubst:SAL entry|1=18:51 urandom: Upgrade restbase-dev1005-{a,b} to Cassandra 3.11.4 -- T200803}}
  • {{safesubst:SAL entry|1=18:45 urandom: Upgrade restbase-dev1004-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 18:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:43 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid config changes
  • 17:19 eileen: civicrm revision changed from 2ba100486e to 5a2f8048c4, config revision is 5560cc0878
  • 16:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:48 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9289 and previous config saved to /var/cache/conftool/dbconfig/20191009-160506-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9288 and previous config saved to /var/cache/conftool/dbconfig/20191009-153705-marostegui.json
  • 15:04 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:02 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1085 vslow and dump group', diff saved to https://phabricator.wikimedia.org/P9287 and previous config saved to /var/cache/conftool/dbconfig/20191009-145102-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9286 and previous config saved to /var/cache/conftool/dbconfig/20191009-144928-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9285 and previous config saved to /var/cache/conftool/dbconfig/20191009-144607-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'More trafic to db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9284 and previous config saved to /var/cache/conftool/dbconfig/20191009-144400-marostegui.json
  • 14:38 elukey: cr1-eqsin: change IPv6 address for BGP peer AS4761
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9283 and previous config saved to /var/cache/conftool/dbconfig/20191009-141137-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9282 and previous config saved to /var/cache/conftool/dbconfig/20191009-140749-marostegui.json
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: rebalancing Ganeti eqiad/row A after rolling reboots of Ganeti nodes
  • 13:48 jbond42: reimage puppetmaster2001
  • 13:37 vgutierrez: repooling cp1085 - T231525
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1075', diff saved to https://phabricator.wikimedia.org/P9280 and previous config saved to /var/cache/conftool/dbconfig/20191009-133709-marostegui.json
  • 13:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928 (duration: 14m 26s)
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9279 and previous config saved to /var/cache/conftool/dbconfig/20191009-125641-marostegui.json
  • 12:42 marostegui: Stop MySQL and power off db1074 for BBU replacement T231638
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9278 and previous config saved to /var/cache/conftool/dbconfig/20191009-124218-marostegui.json
  • 12:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2 (duration: 08m 18s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9277 and previous config saved to /var/cache/conftool/dbconfig/20191009-124035-marostegui.json
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 moritzm: disabled puppet on DNS recursors for staged rollout of ferm NTP change
  • 12:35 jbond42: reimage puppetmaster2002
  • 12:32 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2
  • 12:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928 (duration: 09m 40s)
  • 12:28 vgutierrez: depooling cp1085 for a power drain - T231525
  • 12:20 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928
  • 12:13 moritzm: draining ganeti1001 for upcoming reboot (combined kernel/qemu security updates)
  • 12:10 moritzm: failover Ganeti master in eqiad to ganeti1003
  • 12:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:32 moritzm: draining ganeti1008 for upcoming reboot (combined kernel/qemu security updates)
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 Amir1: EU SWAT is done
  • 11:04 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put write both limit down to Q70m for item terms (T234948) (duration: 01m 10s)
  • 11:04 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:58 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:18 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:44 moritzm: draining ganeti1007 for upcoming reboot (combined kernel/qemu security updates)
  • 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:59 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change, temporarily pool db1085 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9276 and previous config saved to /var/cache/conftool/dbconfig/20191009-085016-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P9275 and previous config saved to /var/cache/conftool/dbconfig/20191009-084732-marostegui.json
  • 08:39 vgutierrez: Switch cp1082 from nginx to ats-tls - T231433
  • 08:24 moritzm: draining ganeti1006 for upcoming reboot (combined kernel/qemu security updates)
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: Switch cp2011 from nginx to ats-tls - T231433
  • 07:48 moritzm: reduced RAM assignment for boron to 8G
  • 07:38 vgutierrez: Switch cp3038 from nginx to ats-tls - T231433
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:34 vgutierrez: switching from nginx to ats-tls on cp4024 - T231433
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013, es1014 T227536 (duration: 01m 00s)
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change - lag will be generated on s6 labs', diff saved to https://phabricator.wikimedia.org/P9274 and previous config saved to /var/cache/conftool/dbconfig/20191009-051911-marostegui.json
  • 05:11 marostegui: Restart gerrit as it is down
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P9273 and previous config saved to /var/cache/conftool/dbconfig/20191009-045941-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312', diff saved to https://phabricator.wikimedia.org/P9272 and previous config saved to /var/cache/conftool/dbconfig/20191009-044752-marostegui.json
  • 04:40 vgutierrez: switching cp5004 from nginx to ats-tls - T231433

2019-10-08

  • 23:28 mutante: phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
  • 23:05 ebernhardson@deploy1001: Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
  • 21:28 XenoRyet: updated payments-wiki from d2e2637275 to 8a65f57874
  • 21:09 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 20:38 mutante: labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
  • 20:24 mutante: labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
  • 19:51 marxarelli: 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
  • 19:43 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
  • 19:38 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
  • 19:29 shdubsh: adding swagger exporter to apt repo
  • 19:13 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
  • 18:54 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
  • 18:53 godog: codfw-prod: more weight to ms-be205[1-6] - T233638
  • 18:45 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
  • 17:32 marxarelli: cutting wmf/1.35.0-wmf.1
  • 16:17 cstone: civicrm revision changed from db7ef10bfa to 2ba100486e
  • 16:00 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:30 XioNoX: remove 2 more sessions to AS12871 on cr2-esams - T232617
  • 15:20 XioNoX: add BGP sessions to AS199524 on cr2-eqdfw
  • 15:18 XioNoX: add BGP sessions to AS2635 on cr2-eqiad
  • 15:13 XioNoX: renumber BGP session to AS4761 on cr1-eqsin
  • 13:53 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:51 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
  • 13:50 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
  • 13:49 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 marostegui@cumin2001: dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
  • 13:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
  • 13:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
  • 13:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
  • 12:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
  • 12:27 marostegui: Stop MySQL on es1012 for onsite maintenance
  • 12:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:10 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:58 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:57 jbond42: testing ipmi reset cookbook. using the current pass for both old and new so no reset actully occures
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:57 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:22 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:21 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 moritzm: draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
  • 10:16 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
  • 10:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:09 mobrovac@deploy1001: Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
  • 09:20 marostegui: Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
  • 09:09 moritzm: draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 mobrovac@deploy1001: Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
  • 08:38 mobrovac@deploy1001: Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
  • 08:33 elukey: roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
  • 08:10 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:10 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:09 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 07:51 moritzm: draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:49 akosiaris: update OTRS to 5.0.38
  • 07:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
  • 07:10 moritzm: draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
  • 06:48 marostegui: Stop MySQL on es1011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
  • 06:09 marostegui: Repool labsdb1011 after mysql upgrade
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:44 elukey: drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
  • 05:35 elukey: drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
  • 05:25 marostegui: Depool labsdb1011 for mysql upgrade
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
  • 05:10 marostegui: Reload query killer on labsdb1011
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
  • 05:07 marostegui: Deploy schema change on db1097:3315 - T233625
  • 03:04 andrewbogott: restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 — experimental band-aid for T234876
  • 00:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)

2019-10-07

  • 23:52 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:26 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 00m 49s)
  • 23:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:21 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)
  • 22:29 chaomodus: restart nagios-nrpe-server on stat1007
  • 21:56 mutante: gerrit2001 - sudo rm /etc/apache2/sites-available/50-gerrit-slave-wikimedia-org.conf
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Run Labs config after CSP config so it can change it (duration: 00m 51s)
  • 21:20 godog: swift codfw-prod: add ms-be205[3456] - T233638
  • 20:56 XenoRyet: updated payments-wiki from b94da68f7e to d2e2637275
  • 20:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:33 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:29 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add the beta REL1_34 to ExtensionDistributor (duration: 00m 50s)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 Lucas_WMDE: Morning SWAT done
  • 19:09 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/Wikibase: SWAT: Revert "Format coordinates with limited precision" (T174504) (duration: 00m 57s)
  • 18:33 Lucas_WMDE: reopen Morning SWAT for another backport (sorry)
  • 18:26 Urbanecm: Morning SWAT done
  • 18:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: 011b6eb: 11033b7: Update VE core submodule to 2ffb699eb (TreeModifier fixes), T234489, T234742 + ve.ui.MWDefinedTransclusionContextItem: Fix handling of template names (T234817) (duration: 00m 53s)
  • 18:16 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/539978
  • 18:12 andrewbogott: apt dist-upgrade on all cloudvirts (for nova upgrades)
  • 18:12 godog: start swiftrepl eqiad -> codfw (no deletes)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f434ae3: Enable NewUserMessage on sq.wikipedia and sq.wikiquote (T234499) (duration: 00m 52s)
  • 18:07 jgleeson: Updating civicrm from c12f7bb51f to db7ef10bfa
  • 17:46 ottomata: stat1007 is unresponsive, can't login via mgmt either. powercycling.
  • 17:29 XioNoX: add BGP route damping on IX sessions - eqiad - T222424
  • 17:27 XioNoX: add BGP route damping on IX sessions - esams - T222424
  • 17:22 XioNoX: add BGP route damping on IX sessions - eqsin - T222424
  • 15:34 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae (duration: 06m 28s)
  • 15:30 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:27 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae
  • 15:27 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop writing wmgVisualEditorEnableNewMobileContext (duration: 00m 51s)
  • 15:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgVisualEditorEnableNewMobileContext (duration: 00m 52s)
  • 14:25 arturo: upgrading openstack in CloudVPS. Some IRC bots and related stuff may be unavailable (T212302)
  • 14:17 marostegui: Deploy schema change on db1139:3316 - T233135 T234066
  • 13:27 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata to write both for item term store (T225055) (duration: 00m 54s)
  • 13:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2 (duration: 06m 38s)
  • 13:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9248 and previous config saved to /var/cache/conftool/dbconfig/20191007-131720-marostegui.json
  • 13:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging (duration: 07m 01s)
  • 13:13 elukey: upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941
  • 13:09 mobrovac@deploy1001: Started deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging
  • 13:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: (no justification provided) (duration: 00m 29s)
  • 13:04 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: (no justification provided)
  • 13:04 mobrovac@deploy1001: deploy aborted: Minor tweaks to VE logging (duration: 01m 07s)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9247 and previous config saved to /var/cache/conftool/dbconfig/20191007-130317-marostegui.json
  • 13:03 mobrovac@deploy1001: Started deploy [restbase/deploy@fe39197]: Minor tweaks to VE logging
  • 12:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restrouter
  • 12:54 elukey: upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941
  • 11:44 Lucas_WMDE: EU SWAT done
  • 11:44 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of main page hack for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:42 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:41 Amir1: another hack bites the dust
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/GrowthExperiments/: SWAT: Homepage: Don't use flexbox for vertical layouts in mobile start module (T234380) (duration: 00m 53s)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on nlwiki (T234685) (duration: 00m 52s)
  • 11:16 arturo: added bdsync 0.11.1-1~wmf1 to buster-wikimedia (T234683)
  • 10:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5 (duration: 04m 17s)
  • 10:55 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5
  • 10:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4 (duration: 04m 27s)
  • 10:50 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4
  • 10:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3 (duration: 03m 53s)
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:31 _joe_: uploading confd 0.16.0 to stretch
  • 10:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2 (duration: 01m 56s)
  • 10:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2
  • 10:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)
  • 10:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772
  • 09:55 marostegui: Deploy schema change on db2129 (s6 codfw master), this will generate lag on s6 codfw - T233135 T234066
  • 08:34 hashar: gerrit: force reindexing all changes ( gerrit index start changes --force )
  • 07:09 marostegui: Remove grants for dbproxy1006 on m1 databases - T231280
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9246 and previous config saved to /var/cache/conftool/dbconfig/20191007-065645-marostegui.json
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1011 T227138 (duration: 01m 10s)
  • 06:08 elukey: upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:25 marostegui: Deploy schema change on db2124 T233135 T234066
  • 05:10 marostegui: The above was for db2095:3316 T234704
  • 05:08 marostegui: Stop replication on db2076 to modify triggers on db2096:3316 T234704
  • 05:02 marostegui: Fix replication on labsdb1011:s8
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9245 and previous config saved to /var/cache/conftool/dbconfig/20191007-045411-marostegui.json

2019-10-06

  • 20:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Racconish /home/urbanecm/T234741 (T234741)
  • 19:15 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy1018, dbproxy1019
  • 06:47 elukey: delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam

2019-10-05

  • 06:48 elukey: force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory

2019-10-04

  • 22:06 mutante: ms-be1020 - power cycle via mgmt - host down
  • 20:43 krinkle@deploy1001: Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)
  • 20:41 mutante: deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)
  • 20:32 mutante: gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)
  • 19:27 mutante: wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts
  • 14:03 marostegui: Deploy schema change on db2117 T233135 T234066
  • 13:50 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:36 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:28 marostegui: Deploy schema change on db2097:3316 T233135 T234066
  • 12:23 elukey: cleaned up old files and apt-cache from an-coord1001
  • 08:41 marostegui: Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066
  • 08:32 _joe_: reuploading the old confd package to stetch-wikimedia, some incompatibility detected
  • 07:26 elukey: execute gnt-instance remove kerberos1001 on ganeti1001 - T234600
  • 07:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Deploy schema change on db2114 T233135 T234066
  • 06:22 _joe_: downgrading confd back to 0.9.0 while some templates get fixed.
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui: Deploy schema change on dbstore1005:3316 T233135 T234066
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:53 _joe_: upgrading confd on puppetmaster1001 T147204
  • 05:50 _joe_: uploading confd 0.16.0 on stretch T147204
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)

2019-10-03

  • 23:50 mutante: gerrit - restarting for replication config tweaks
  • 20:05 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 19:52 XenoRyet: updated payments-wiki from 80dead6444 to b94da68f7e
  • 19:40 mutante: mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153
  • 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 19:30 marxarelli: 1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors
  • 19:21 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25
  • 19:19 mutante: puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)
  • 19:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:52 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)
  • 18:43 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c2b3d7c (duration: 00m 59s)
  • 18:14 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)
  • 17:13 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)
  • 17:07 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7
  • 13:49 elukey: roll restart hadoop yarn resource managers for openssl updates on Hadoop workers
  • 13:44 marostegui: Stop MySQL and shutdown es1019 for on-site maintenance - T233698
  • 13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)
  • 13:29 hashar: Gerrit should be back
  • 13:26 hashar: restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl
  • 13:22 hashar: Gerrit might be dead again; taking traces
  • 13:04 _joe_: restarting php7 on mw1275
  • 12:54 onimisionipe: force shard allocation on eqiad chi cluster
  • 10:27 elukey: killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs
  • 10:25 jbond42: rolling upgrade of openssl packages
  • 10:21 Urbanecm: Manually cleared signup throttle for IP 80.188.128.54 at cswiki, issue with introduced throttle rule
  • 10:20 Urbanecm: Manually cleared signup throttle for IP 88.100.221.84 at cswiki, issue with introduced throttle rule
  • 10:18 Urbanecm: Manually cleared signup throttle for IP 90.176.155.12 at cswiki, issue with introduced throttle rule
  • 09:32 elukey: run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop)
  • 08:33 marostegui: Deploy schema change on db2087:3316 T233135 T234066
  • 08:28 marostegui: Deploy schema change on db1096:3316 - T233625
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9236 and previous config saved to /var/cache/conftool/dbconfig/20191003-082651-marostegui.json
  • 08:15 akosiaris: slowly rolling restart all pods in eqiad, codfw, staging for log rollover before merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539912
  • 07:49 marostegui: Set notes on the sanitarium masters - T234039
  • 07:19 marostegui: Remove unused labspuppet database from m5 - T233281
  • 07:03 @: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 07:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 06:59 eileen: tools revision changed from e1b81688c6 to b3c7453be2
  • 06:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 06:48 marostegui: Drop database grants on m5 for labspuppet - T233281
  • 06:37 marostegui: Rename tables on m5 master on designate_pool_manager - T233978
  • 06:16 marostegui: Deploy schema change on db2089:3316 T233135 T234066
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 eileen: civicrm revision changed from 12c5727a23 to c12f7bb51f, config revision is 422a0f7d48
  • 02:07 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1c599baea51f9 (duration: 01m 03s)
  • 01:05 mutante: gerrit1001 - shutdown - scheduled downtime
  • 00:51 mutante: gerrit1001 - removing wrong IPv6 address from interface, running puppet

2019-10-02

  • 23:42 XioNoX: enable cr2-eqiad:xe-4/0/0 - T234416
  • 23:38 XioNoX: disable cr2-eqiad:xe-4/0/0 - T234416
  • 23:22 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 00s)
  • 23:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 02s)
  • 22:29 godog: remove queued messages from mx1001 for fr-tech-ops@, triggering sender rate limit from gmail
  • 22:12 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:11 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 00m 59s)
  • 22:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 01m 00s)
  • 21:17 mutante: cobalt (gerrit) rsyncing /srv/gerrit/git and /srv/gerrit/plugins data to gerrit1001 again after reinstall and fixing gerrit2 UID/GID (T222391)
  • 21:13 mutante: gerrit1001 - rebooting
  • 21:08 mutante: gerrit1001 changing GID of gerrit2 user to 119 in /etc/group ; find / -uid 499 -exec chown gerrit2 {} \; find / -gid 1001 -exec chown gerrit2:gerrit2 {} \; (T222391)
  • 21:03 mutante: gerrit1001 changing UID of gerrit2 user to 114 and GID to 119 in /etc/passwd to match cobalt to avoid privilege issues after rsyncing data (T222391)
  • 19:58 mutante: puppetmaster1001 - sudo puppet cert clean parsoid.discovery.wmnet (only created yesterday but does not have all the SANs it needs, updating with more SANs) (T233654)
  • 19:47 Jeff_Green: deployed icinga fundraising-nsca collection configuration change
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 marxarelli: 1.34.0-wmf.25 promoted to group1, cc: T220750. no rise in relevant error rates
  • 19:23 dduvall@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.25 (duration: 00m 59s)
  • 19:22 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.25
  • 18:28 XioNoX: add BGP route damping on IX sessions - eqord - T222424
  • 18:25 XioNoX: add BGP route damping on IX sessions - eqdfw - T222424
  • 18:15 XioNoX: add BGP route damping on IX sessions - ulsfo - T222424
  • 17:08 Lucas_WMDE: Morning SWAT done
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: SWAT: vector.js: Remove eager calculation of p-cactions width on page load (duration: 01m 00s)
  • 16:53 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: Enabling revision-score stream in eventstreams
  • 16:50 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:50 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: (no justification provided) (duration: 00m 01s)
  • 16:50 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: ApiVisualEditor: Add logging for RESTBase HTTP errors (T233127) + ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 04s)
  • 16:42 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/: SWAT: ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 03s)
  • 15:31 godog: correction, add ms-be2052
  • 15:29 godog: swift codfw-prod: add ms-be2051 T233638
  • 15:13 godog: run swiftrepl eqiad -> codfw on ms-fe1005 (no deletes)
  • 14:31 moritzm: installing libxslt security updates on stretch
  • 14:16 moritzm: installing babeltrace bugfix update from buster point release
  • 13:18 moritzm: installing mariabd-10.3 update from buster point release (just client side libs, tools)
  • 13:16 moritzm: installing console-setup bugfix update from buster point release
  • 11:28 moritzm: installing cryptsetup bugfix from buster 10.1 point release
  • 11:26 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 01711d5: Enable partial blocks at ptwiki (T233754) (duration: 00m 55s)
  • 11:26 jbond42: update puppet.eqiad.wmnet to puppetmaster2001
  • 11:24 jbond42: update puppet.esams.wmnet to puppetmaster2001
  • 11:20 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set new MFMobileFormatterOptions config using old config (T232690) (duration: 01m 01s)
  • 11:15 _joe_: testing the package on restbase-dev1006
  • 11:14 _joe_: uploaded service-checker 0.2.0 to stretch-wikimedia
  • 11:12 pmiazga@deploy1001: Synchronized wmf-config/mobile.php: SWAT: Do not set wgMFNoindexPages config flag in mobile.php (T206497) (duration: 01m 14s)
  • 10:17 gehel@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:17 gehel@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:41 moritzm: rebalancing Ganeti/codfw Row A after rolling reboot of Ganeti nodes
  • 07:46 moritzm: upgrading remaining stretch hosts to ferm 2.4.2pre
  • 06:23 marostegui: Fix replication on labsdb1011:s7 - T233986
  • 06:17 marostegui: Fix replication on labsdb1011:s1 - T233986
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:07 vgutierrez: restarting trafficserver-tls on cp5007
  • 00:54 ejegg: updated fundraising CiviCRM from 6d90d0cf06 to 12c5727a23
  • 00:34 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/resources/src: 5eb3ae1 (duration: 01m 00s)
  • 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: d30064229f9 (duration: 00m 59s)

2019-10-01

  • 23:46 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: T233127: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)
  • 23:44 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:28 mutante: cobalt (gerrit) rsyncing /srv/gerrit/plugins dir, push to new server gerrit1001 (T222391)
  • 23:21 mutante: gerrit1001 - chown -R gerrit2:gerrit2 /srv/gerrit/git/ (T222391)
  • 23:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233211: CirrusSearch: Configuration for glent m0 AB test (duration: 00m 58s)
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233127: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)
  • 22:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 22:19 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 21:34 godog: swift codfw-prod: add ms-be2051 with minimal weight - T233638 T222366
  • 21:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: bb2fd9cf9c22cc (duration: 01m 00s)
  • 21:29 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 21:29 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 20:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 20:10 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:58 mutante: cobalt (gerrit) - rsyncing gerrit data to gerrit1001 in a screen session (T222391)
  • 19:47 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 19:47 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:42 marxarelli: 1.34.0-wmf.25 promoted to group0 cc: T220750. no rise in relevant error rates
  • 19:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.25
  • 19:30 marxarelli: promoting 1.34.0-wmf.25 to group0
  • 19:28 dduvall@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache (duration: 19m 31s)
  • 19:08 dduvall@deploy1001: Started scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache
  • 19:07 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.23 (duration: 01m 32s)
  • 19:04 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.22 (duration: 01m 41s)
  • 19:02 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.21 (duration: 01m 57s)
  • 19:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 19:00 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 18:59 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.20 (duration: 02m 11s)
  • 18:57 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.19 (duration: 02m 12s)
  • 18:54 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.17 (duration: 02m 48s)
  • 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 (duration: 18m 45s)
  • 17:53 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:52 thcipriani: gerrit restart for new config changes incoming
  • 17:52 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:48 XioNoX: rotate PDUs passwords - T233053
  • 17:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)
  • 17:07 mutante: Welcome new deployer Andrew Kostka (WMDE) (T233202)
  • 17:07 marxarelli: cutting wmf/1.34.0-wmf.25
  • 16:16 _joe_: manually downgrading php-geoip on deploy*, it was still at the 7.0-only version from the distro
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:10 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 15:36 _joe_: uninstalling temporarily the math rendering related packages from mwdebug2002, test for T195847
  • 15:36 elukey: powercycle an-conf1001 to test some bios settings
  • 15:12 jbond42: puppetmaster2001 is back online
  • 14:34 dcausse: created cirrussearch indices for nqowiki (T234326)
  • 14:18 moritzm: rebooting krb1001 for some tests
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:10 hashar: Restarting CI Jenkins
  • 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/labs/private ; git rev-parse HEAD | sudo tee /srv/config-master/labsprivate-sha1.txt )
  • 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/operations/puppet ; git rev-parse HEAD | sudo tee /srv/config-master/puppet-sha1.txt )
  • 14:08 herron: beginning rolling reboots of eqiad and codfw logstash collectors
  • 14:02 moritzm: rebooting mw1265 for some tests
  • 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:59 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ sudo touch /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt && sudo chown gitpuppet:gitpuppet /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:24 jbond42: reimage puppetmaster2001
  • 12:37 hashar: Gerrit misbehaved temporarily due to human operator error (hashar ran jstack -l -m which bring the jvm to an halt)
  • 11:16 jbond42: update puppet.ulsfo.wmnet to point to puppetmaster1001
  • 10:45 jbond42: update puppet.esqin.wmnet to point to puppetmaster1001
  • 10:17 moritzm: upgrading ferm on remaining mw servers 2.4.2pre T153468
  • 09:35 moritzm: run systemctl reset-failed on puppetmaster2002 to clear failed puppet-master.service
  • 09:19 moritzm: upgrading ferm on a number of systems to 2.4.2pre T153468
  • 09:07 vgutierrez: restarting acme-chief on acmechief1001 to catch up with python3-cryptography upgrades - T234131
  • 09:04 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acme-chief hosts - T234131
  • 09:03 moritzm: rebalancing ganeti/row_B after rolling reboot
  • 08:57 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acmechief-test1001 - T234131
  • 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:00 moritzm: draining ganeti2003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:00 hashar: gerrit: forcing reindex of changes # T233989
  • 06:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:28 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9223 and previous config saved to /var/cache/conftool/dbconfig/20191001-061956-marostegui.json
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:12 mutante: phabricator - upgrading PHP version to 7.2.22 - T230024

2019-09-30

  • 23:28 niharika29@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CentralNotice/resources/infrastructure/: CentralNotice: Replace deprecated editToken with csrfToken - T233538 (duration: 00m 57s)
  • 23:23 AndyRussG: updated fruec from c591bd653b to 18d89675d0
  • 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 21:47 mutante: mw1290 - scap pull to get it in sync with latest deployment - it was down during scap run for T234153
  • 21:42 jforrester@deploy1001: Synchronized robots.txt: Remove old InternetArchive bot rule that's been disabled since 2008 T7582 (duration: 00m 57s)
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T222539 Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)
  • 21:38 James_F: sync failure on mw1290.eqiad.wmnet – Connection timed out
  • 21:26 mutante: mw1290 - downtimed for onsite work on mgmt, depooled earlier
  • 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 21:08 XioNoX: delete BGP to AS131285 on cr1-eqsin
  • 20:43 arlolra: Updated Parsoid to 1922eb6 (T233459, T230359, T208070)
  • 20:43 arlolra: T208070
  • 20:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6 (duration: 08m 39s)
  • 20:25 arlolra@deploy1001: Started deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6
  • 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f (duration: 05m 55s)
  • 20:00 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f
  • 19:15 XenoRyet: Updated payments-wiki from 5193dcdfa9 to 80dead6444
  • 17:37 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 03m 03s)
  • 17:33 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:24 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:18 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 00m 05s)
  • 17:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:15 twentyafterfour@deploy1001: deploy aborted: fix T234223 (duration: 06m 24s)
  • 17:10 twentyafterfour: deploy failed
  • 17:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:08 twentyafterfour: deploying minor update to phatality to fix T234223
  • 16:35 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:34 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0aa4b4b (duration: 00m 57s)
  • 16:34 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226 (duration: 01m 17s)
  • 16:32 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: 0aa4b4b (duration: 00m 57s)
  • 16:32 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:25 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:49 moritzm: installing console-setup bugfixes from Buster 10.1 point release
  • 15:46 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:46 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:42 moritzm: failover Ganeti master in codfw to ganeti2001
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:29 moritzm: draining ganeti2007 for upcoming reboot (combined kernel/qemu security updates)
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:08 moritzm: draining ganeti2006 for upcoming reboot (combined kernel/qemu security updates)
  • 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:54 moritzm: draining ganeti2005 for upcoming reboot (combined kernel/qemu security updates)
  • 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 kart_: Update cxserver to 2019-09-26-034732-production (T233834, T232674, T233085)
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:29 jbond42: offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322
  • 12:27 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:00 Urbanecm: EU SWAT done #2
  • 12:00 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 3f4f242: New throttle rule for Czech wiki course (T234113) (duration: 00m 56s)
  • 11:57 Urbanecm: Reopen EU SWAT to deploy throttle rule for October 02 (T234113)
  • 11:54 raynor: EU SWAT finished
  • 11:54 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable alternate mobile link for it, nl, ko wikis. (T206497) (duration: 00m 57s)
  • 11:27 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 539517|Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007) (duration: 00m 59s)
  • 11:20 hashar: Restarting Docker on integration-agent-puppet-docker-1001 # T234197
  • 11:08 hashar: Restarting Docker on CI agents to clear out some docker/iptables oddity # T234197
  • 10:48 hashar: CI outage is tracked in https://phabricator.wikimedia.org/T234197
  • 10:42 moritzm: draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates)
  • 10:40 hashar: CI down due to some DNS related failure on the hosts :-\
  • 10:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:30 moritzm: uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups (T153468)
  • 09:11 moritzm: draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates)
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json
  • 08:01 moritzm: installing e2fsprogs security updates on Stretch/Buster
  • 07:56 marostegui: Stop dbstore1003:3311 for troubleshooting
  • 06:47 moritzm: installing exim security updates on buster

2019-09-28

  • 16:28 vgutierrez: restarting acme-chief on acmechief1001

2019-09-27

  • 22:44 mutante: phab2001 - apt-get autoremove - remove unused python and ruby packages
  • 22:36 mutante: phab2001 - upgrade php7.2 packages to 7.2.22 (T230024)
  • 22:03 mutante: webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names
  • 18:22 mutante: mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063)
  • 18:17 mutante: mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063)
  • 16:01 XioNoX: delete BGP to AS34305 on cr2-esams
  • 15:34 elukey: update pcc facts to add new hosts
  • 15:02 moritzm: installing usb.ids update from Buster 10.1 point release
  • 14:45 moritzm: installing ncurses bugfix update from Buster 10.1 point release
  • 14:39 moritzm: installing postgresql-common bugfix update from Buster 10.1 point release
  • 14:32 effie: Disable puppet and reload apache on mw* for 539465 and 539488 - T229792
  • 13:33 marostegui: Set candidate masters in dbctl T234039
  • 13:31 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:29 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:16 moritzm: reimaging auth1002 to buster
  • 13:09 akosiaris: reboot ganeti2001 T233906
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:03 effie: Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7
  • 12:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 moritzm: installing openldap security updates on Buster
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:37 moritzm: killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated)
  • 12:30 moritzm: installing glib2.0 security updates on Buster
  • 12:14 moritzm: reimaging auth2001 to buster
  • 12:06 moritzm: install gnupg2 security update from Buster 10.1 point release
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json
  • 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:02 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for Czech course (T234024) (duration: 00m 59s)
  • 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:06 moritzm: running a few ferm tests on cp1008, puppet disabled
  • 07:36 godog: swift eqiad-prod: remove ms-be1027 - T233289
  • 05:42 XioNoX: remove tcp-mss clamping from cr2-eqiad - T232602
  • 05:30 XioNoX: remove tcp-mss clamping from cr2-eqord - T232602
  • 05:23 XioNoX: remove tcp-mss clamping from cr1-eqiad - T232602
  • 00:53 twentyafterfour: hotfixing phabricator fatal exception refs T233998

2019-09-26

  • 22:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211620 Enable emails for certain notification types by default on officewiki (duration: 00m 56s)
  • 22:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s)
  • 22:02 jforrester@deploy1001: Synchronized wmf-config/filebackend.php: T228547 Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s)
  • 21:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228547 Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s)
  • 21:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228547 Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s)
  • 20:53 ejegg: updated fundraising CiviCRM from 52d2a24404 to 6d90d0cf06
  • 19:58 phedenskog@deploy1001: Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s)
  • 19:58 phedenskog@deploy1001: Started deploy [performance/navtiming@1880a79]: Test deploy
  • 19:52 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:52 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:46 phedenskog@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24 refs T220749
  • 19:17 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s)
  • 19:17 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release (test)
  • 19:13 twentyafterfour: preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs T220749
  • 18:45 ayounsi@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s)
  • 18:44 ayounsi@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:35 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s)
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s)
  • 18:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s)
  • 18:29 mforns@deploy1001: Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101)
  • 18:25 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s)
  • 18:25 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:17 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s)
  • 18:15 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s)
  • 18:15 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s)
  • 18:07 ayounsi@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s)
  • 18:06 ayounsi@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:04 mutante: running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for T233654
  • 18:04 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s)
  • 18:04 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:03 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s)
  • 18:02 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s)
  • 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s)
  • 17:49 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s)
  • 17:49 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s)
  • 17:43 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 (duration: 02m 04s)
  • 17:41 ppchelko@deploy1001: Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211
  • 17:35 elukey: run apt-get autoremove on stat* and notebook* to clean up old python2 deps
  • 17:31 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:13 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad
  • 17:11 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:08 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:40 papaul: upgrading firmware on scs-c1-codfw
  • 16:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕛☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw
  • 15:56 cdanis: sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams
  • 15:35 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo
  • 15:15 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin
  • 15:06 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s)
  • 15:03 mforns@deploy1001: Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap
  • 15:00 cdanis: dbctl schema migration done T229677
  • 14:47 cdanis: dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades T229677
  • 14:43 cdanis@cumin1001: dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json
  • 14:41 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin
  • 14:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster
  • 14:36 cdanis: ✔️ cdanis@puppetmaster1001.eqiad.wmnet ~ 🕥☕ sudo apt install python3-conftool
  • 14:19 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕥☕ sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes
  • 14:16 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes
  • 11:31 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 (T233922)
  • 11:23 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 3/3) (duration: 01m 05s)
  • 11:14 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104)
  • 11:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 2/3) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7645e55: Enable reader demographic surveys in English, Polish, and Russian (T232525) (duration: 01m 06s)
  • 11:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 1/3) (duration: 01m 08s)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 jbond42: reimagaing puppetmaster1002 to buster
  • 10:48 vgutierrez: switching from nginx to ats-tls on cp5007 - T231627
  • 09:55 moritzm: bouncing postgres on puppetdb1002/2002
  • 09:18 vgutierrez: switching from nginx to ats-tls on cp1080 - T231433
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json
  • 09:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 (duration: 21m 32s)
  • 09:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:47 vgutierrez: switching from nginx to ats-tls on cp2008 - T231433
  • 08:43 mobrovac@deploy1001: Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json
  • 08:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from 1 to 100 - T231018', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json
  • 08:13 vgutierrez: switching from nginx to ats-tls on cp3036 - T231433
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json
  • 08:07 elukey: executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json
  • 08:02 marostegui: Depool db1078 to restart mysql to change its binlog format to ROW
  • 07:57 vgutierrez: switching from nginx to ats-tls on cp4023 - T231433
  • 07:49 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:42 moritzm: draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates)
  • 07:41 vgutierrez: switching from nginx to ats-tls on cp5003 - T231433
  • 07:10 marostegui: Power off db1114 for mainboard replacement T229452
  • 07:09 marostegui: Stop mysql on db1114 for mainboard replacement - T229452
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Sanitize nqowiki on db1124:3313 and db2094:3313 - T230543
  • 06:39 marostegui: Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give some API weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T230784', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T230784', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json
  • 05:00 marostegui: Starting s4 failover from db1081 to db1138 - T230784
  • 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T230784', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json
  • 04:10 marostegui: Start pre-switchover s4 steps T230784

2019-09-25

  • 21:59 bblack: remove GRE MTU hacks on archiva1001 gerrit2001 cobalt install1002 - T232602
  • 21:58 bblack: remove GRE MTU hacks on eqiad caches (cp1xxx) - T232602
  • 21:57 bblack: remove GRE MTU hacks on esams caches (cp3xxx) - T232602
  • 21:56 bblack: remove GRE MTU hacks on eqsin caches (cp5xxx) - T232602
  • 21:10 AndyRussG: update fruec from 97128874bf to c591bd653b
  • 21:00 ejegg: updated fundraising internal dashboard from 4473c65af0 to 69fdbec60d
  • 20:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286) (duration: 05m 32s)
  • 20:20 hashar: Upgrading CI Jenkins
  • 20:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286)
  • 19:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.24 refs T220749 (duration: 01m 03s)
  • 19:27 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.24 refs T220749
  • 18:24 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: trying again (duration: 03m 31s)
  • 18:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: trying again
  • 18:19 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: deploy for version 5.6.15 (duration: 00m 50s)
  • 18:19 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: deploy for version 5.6.15
  • 18:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: Deploy phatality (duration: 00m 24s)
  • 18:13 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: Deploy phatality
  • 18:11 Amir1: creating nqowiki is finished now
  • 18:10 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 18:07 ladsgroup@deploy1001: Synchronized dblists/rtl.dblist: Create nqowiki T230359 (duration: 01m 05s)
  • 18:01 Amir1: creating nqowiki is going to take five more minutes
  • 17:57 ladsgroup@deploy1001: Synchronized langlist: Create nqowiki T230359 (duration: 01m 02s)
  • 17:56 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Create nqowiki T230359 (duration: 01m 05s)
  • 17:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create nqowiki T230359 (duration: 01m 04s)
  • 17:51 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:47 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 04s)
  • 17:29 mutante: DNS - adding nqo (N'Ko) to langlist for new nqo.wikipedia, approved by langcom https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_N'Ko (T230359)
  • 17:11 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 05s)
  • 17:08 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 04s)
  • 16:19 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey for euwiki (T233063) (duration: 01m 04s)
  • 16:06 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 537628|Fix incorrect channel name for TranslationNotifications extension (T144780) (duration: 01m 06s)
  • 15:38 moritzm: installing php5 security updates
  • 15:07 moritzm: imported jenkins 2.176.4 for jessie/stretch T233214
  • 14:57 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:57 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:55 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/Wikibase/view/lib/resources.php: Revert "Merge valueview modules": T233800 (duration: 01m 04s)
  • 14:53 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Draft namespace aliases (T233770) (duration: 01m 04s)
  • 14:52 onimisionipe: pool wdqs1005 - lag issues have minimized.
  • 14:38 moritzm: restarting apache on analytics-tool/an-tool to pick up Expat security update
  • 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:34 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: restarting apache on grafana1001 to pick up Expat security update
  • 14:14 moritzm: restarting apache on various services to pick up Expat security update (releases, netmon, miscweb, graphite, planet,puppetboard)
  • 14:02 marostegui: Deploy schema change on db2086:3318
  • 14:00 effie: Rolling restart thumbor for expat updat
  • 13:55 moritzm: rolling restart of apache on webperf* to pick up Expat security update
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9183 and previous config saved to /var/cache/conftool/dbconfig/20190925-135317-marostegui.json
  • 13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:51 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:45 _joe_: restarting trafficserver on cp1075 to pick up the change
  • 13:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230817 Remove origin trials config (duration: 01m 05s)
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9182 and previous config saved to /var/cache/conftool/dbconfig/20190925-133146-marostegui.json
  • 13:31 moritzm: installing remaining expat security updates
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9181 and previous config saved to /var/cache/conftool/dbconfig/20190925-132147-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9180 and previous config saved to /var/cache/conftool/dbconfig/20190925-131149-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after replacing its BBU', diff saved to https://phabricator.wikimedia.org/P9179 and previous config saved to /var/cache/conftool/dbconfig/20190925-130613-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9178 and previous config saved to /var/cache/conftool/dbconfig/20190925-125601-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): ' Depool for schema change on the logging table: db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9177 and previous config saved to /var/cache/conftool/dbconfig/20190925-125140-marostegui.json
  • 12:47 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:47 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:46 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 marostegui: Repool labsdb1011 T233766
  • 12:41 marostegui: Shutdown db1075 for onsite maintenance T233534
  • 12:37 marostegui: Stop MySQL on db1075 for BBU replacement T233534
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for BBU replacement T233534', diff saved to https://phabricator.wikimedia.org/P9176 and previous config saved to /var/cache/conftool/dbconfig/20190925-123736-marostegui.json
  • 12:34 onimisionipe: depool wdqs1005 to allow it catch up on lag
  • 12:32 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 12:28 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 12:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286) (duration: 05m 17s)
  • 12:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286)
  • 12:05 akosiaris: depool kubernetes1001 and disable puppet on it for rsyslog mmkubernetes testing
  • 12:05 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.*
  • 11:57 vgutierrez: switch cp1078 from nginx to ats-tls - T231433
  • 11:37 vgutierrez: switch cp2005 from nginx to ats-tls - T231433
  • 11:29 onimisionipe: restarted wdqs-blazegraph on wdqs1005
  • 11:15 onimisionipe: repooled wdqs1004 to reduce load on the wdqs public cluster
  • 11:15 Urbanecm: EU SWAT done
  • 11:13 vgutierrez: switch cp3035 from nginx to ats-tls - T231433
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 127485c: Fully close bgwikinews (T233322) (duration: 01m 06s)
  • 10:48 vgutierrez: Switch from nginx to ats-tls on cp4022 - T231433
  • 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:27 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 16s)
  • 10:26 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:26 vgutierrez: switch cp5002 from nginx to ats-tls - T231433
  • 10:25 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 12s)
  • 10:25 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:22 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 42s)
  • 10:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 45m 54s)
  • 09:51 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'codfw' .
  • 09:27 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:20 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 02m 24s)
  • 09:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:16 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 54s)
  • 09:15 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 09:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:01 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 08:52 godog: roll-restart kibana
  • 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 05s)
  • 08:48 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 09m 26s)
  • 08:44 vgutierrez: repooling cp4027 - T233667
  • 08:39 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 07:51 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 revert: [cirrus] temp disable sanity check (duration: 01m 05s)
  • 07:38 moritzm: installing emacs updates for buster (from SUA update, extended ELPA repository key)
  • 07:28 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 04s)
  • 07:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 16s)
  • 07:17 onimisionipe: pool wdqs1005 to allow depooling wdqs1004 to handle lag issues
  • 07:17 elukey: allow analytics users to log in into stat1005
  • 06:33 _joe_: restarting pybal on all low-traffic lbs
  • 06:29 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 06:29 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 06:21 marostegui: Deploy schema change on db2085:3311 T233625
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9171 and previous config saved to /var/cache/conftool/dbconfig/20190925-062036-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:06 marostegui: Run a data check on labsdb1011 - T233766
  • 04:43 marostegui: Deploy schema change on s3 with replication - T231172
  • 03:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.24 refs T220749
  • 03:03 krinkle@deploy1001: Synchronized docroot/noc/: c7c6c0ee0, 8405bf1c2 (duration: 01m 05s)
  • 03:01 krinkle@deploy1001: Synchronized src/: c7c6c0ee0, 8405bf1c2 (for noc.wm.o) (duration: 01m 09s)
  • 02:58 twentyafterfour: belatedly promoting wmf.24 to group0 refs T220749
  • 02:32 onimisionipe: depool wdqs1005 to let it catch up with lag
  • 02:30 onimisionipe: pool wdqs1006 - it has caught up with lag
  • 01:16 mutante: stat1007 - restart nagios-nrpe-server, echo "please don't use all of the RAM on this server" | wall
  • 01:14 krinkle@deploy1001: Synchronized wmf-config/: 3373247e12 (duration: 01m 04s)
  • 01:12 krinkle@deploy1001: Synchronized src/WmfClusters.php: 3373247e123b (duration: 01m 04s)
  • 01:08 krinkle@deploy1001: Synchronized tests: 3373247e123b5 (duration: 01m 04s)
  • 01:07 krinkle@deploy1001: Synchronized docroot/noc: 3373247e123b53 and 1efc8bd (duration: 01m 05s)
  • 01:03 krinkle@deploy1001: Synchronized README: 3373247e123b53 (duration: 01m 04s)
  • 01:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3373247e123b53 - create new file (duration: 01m 05s)
  • 00:47 krinkle@deploy1001: Synchronized wmf-config/: 6dca83a9f6c2c (duration: 01m 04s)
  • 00:44 krinkle@deploy1001: Synchronized docroot/noc/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:43 krinkle@deploy1001: Synchronized tests/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:02 mutante: cp1075 - systemctl restart vhtcpd
  • 00:02 mutante: cp1075 - systemctl status vhtcpd

2019-09-24

  • 23:38 mutante: gerrit service restart to switch LDAP backend
  • 23:35 bstorm_: wiki-replicas depooled labsdb1011
  • 23:33 mutante: gerrit2001 - restarting gerrit service
  • 23:30 mutante: switching LDAP servers used by Gerrit to readonly replicas. stop using so called "labs" config for LDAP backend.
  • 22:26 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.24 refs T220749 (duration: 40m 38s)
  • 21:53 mutante: restbase1024 - enable IPMI over LAN which wasn't working before
  • 21:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.24 refs T220749
  • 21:19 mutante: ganeti4001 - racadm racreset - attempt to fix IPMI
  • 20:19 twentyafterfour: restarting gerrit due to unreasonably high garbage collection times and sluggish performance in general.
  • 19:39 XioNoX: disable asw2-d-eqiad:ge-5/0/41 excessive flapping
  • 19:28 ejegg: updated payments-wiki from 939b771800 to 5193dcdfa9
  • 19:20 twentyafterfour: branching 1.34.0-wmf.24 refs T220749
  • 18:45 AndyRussG: updated fruec from fb29cb74 to 97128874bf
  • 18:08 ejegg: updated Fundraising CiviCRM feca96a2e3 to 52d2a24404
  • 17:13 cstone: civicrm revision changed from 5def62ab05 to feca96a2e3
  • 14:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:28 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:17 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 moritzm: rebooting cloudvirt1021 for kernel update
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 13:50 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:50 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:49 jbond42__: promote puppetmaster1003 to a real puppetmaster backend https://gerrit.wikimedia.org/r/c/operations/puppet/+/538686
  • 13:45 _joe_: installing the new conftool version on the cumin hosts
  • 13:40 _joe_: uploaded conftool 1.1.4-3 to stretch-wikimedia, T233679
  • 13:19 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 13:18 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 12:22 arturo: remove systemd-sysv from jessie-wikimedia/openstack-mitaka-jessie in install1002 (T231793)
  • 12:20 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 [cirrus] temp disable sanity check (duration: 00m 55s)
  • 12:18 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 12:16 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455
  • 11:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 11:45 mobrovac@deploy1001: Started deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455
  • 11:43 Urbanecm: EU SWAT done
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 11a48f8: Add support for some languages on Commons and stop support for nys on Wikidata (T230480) (duration: 00m 56s)
  • 11:39 Urbanecm: Run mwscript initSiteStats.php --wiki=napwikisource --update (T233673)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 9eaa4f8: Set wgArticleCountMethod to any for napwikisource (T233673) (duration: 00m 56s)
  • 11:30 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/mxwikimedia.png (T233670)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: b6947c5: Follow-up 8f3f0705baed: add missing namespace for eswiki (T233562) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MassMessage/: SWAT: ba9b209: Provide deduplication info to MassMessageJob (T232379) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1001: Synchronized static/images/project-logos/mxwikimedia.png: SWAT: 246b352: Update logo for mx.wikimedia (T233670) (duration: 00m 54s)
  • 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.less: SWAT: d4c64a7: Fix broken display of mobile overlay headings (T233163) (duration: 00m 57s)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8bf6aae: Enable alternate mobile link for ar,zh,hi wikis (T206497) (duration: 00m 54s)
  • 11:10 _joe_: all wikis (including API) are now served by PHP7 T219150
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a14b772: FileImporter: limited default deployment (2/2; T232539) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8a89652: FileImporter: limited default deployment (1/2; T232539) (duration: 01m 03s)
  • 10:56 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584 (duration: 01m 00s)
  • 10:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584
  • 10:54 _joe_: converting all appservers to php7, T219150
  • 10:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953 (duration: 22m 20s)
  • 10:50 _joe_: converting mw1261 to full-php7
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953
  • 10:12 marostegui: Deploy schema change on s7 (centralauth and wikis) master with replication - T231172
  • 10:03 marostegui: Deploy schema change on s1 master with replication - T231172
  • 09:58 marostegui: Deploy schema change on labswiki (wikitech) and labtestwiki T231172
  • 09:51 effie: Upgrade to php 7.2.22 on mwmaint* - T230024
  • 09:30 marostegui: Deploy schema change on s2 master with replication - T231172
  • 09:26 effie: Upgrade to php 7.2.22 on deploy* - T230024
  • 09:14 marostegui: Drop table archive_save on frwiki T233187
  • 08:43 marostegui: Deploy schema change on s8 master with replication - T231172
  • 08:37 mvolz@deploy1001: scap-helm zotero finished
  • 08:37 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 08:37 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:36 jynus: stop db1114 mariadb process for some time
  • 08:33 moritzm: installed expat security updates on remaining mw* servers
  • 08:33 mvolz@deploy1001: scap-helm zotero finished
  • 08:32 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:30 marostegui: Deploy schema change on s4 master with replication - T231172
  • 08:29 effie: Disable puppet on api cluster and restart php-fpm to finish php7 migration - T219150
  • 08:19 mvolz@deploy1001: scap-helm zotero finished
  • 08:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 08:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 08:18 marostegui: Deploy schema change on s5 master with replication - T231172
  • 07:51 onimisionipe: depool wdqs1006 to clear HTTP too many request error
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 moritzm: uploaded openjdk-8 8u222-b10-1~deb10u2 to buster-wikimedia component/jdk8 T233604
  • 07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 godog: swift eqiad-prod: continue ms-be1027 decom T233289
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:37 marostegui: Stop MySQL on db1066 - T233071
  • 06:36 marostegui: Remove db1066 from tendril and zarcillo T233071
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075', diff saved to https://phabricator.wikimedia.org/P9163 and previous config saved to /var/cache/conftool/dbconfig/20190924-063002-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9162 and previous config saved to /var/cache/conftool/dbconfig/20190924-061943-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9161 and previous config saved to /var/cache/conftool/dbconfig/20190924-053919-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1075', diff saved to https://phabricator.wikimedia.org/P9160 and previous config saved to /var/cache/conftool/dbconfig/20190924-052545-marostegui.json
  • 05:13 cdanis@cumin1001: dbctl commit (dc=all): 're-do T230783 master promotion and set read-write', diff saved to https://phabricator.wikimedia.org/P9159 and previous config saved to /var/cache/conftool/dbconfig/20190924-051307-cdanis.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 master and remove read-only from s3 T230783', diff saved to https://phabricator.wikimedia.org/P9158 and previous config saved to /var/cache/conftool/dbconfig/20190924-051147-marostegui.json
  • 05:10 cdanis: T230783 mark DEFAULT not s3 as readonly in etcd etcd dbconfig data
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 as read-only for maintenance T230783', diff saved to https://phabricator.wikimedia.org/P9157 and previous config saved to /var/cache/conftool/dbconfig/20190924-050034-marostegui.json
  • 05:00 marostegui: Starting s3 failover from db1075 to db1123 - T230783
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1123 T230783', diff saved to https://phabricator.wikimedia.org/P9156 and previous config saved to /var/cache/conftool/dbconfig/20190924-042121-marostegui.json
  • 04:13 marostegui: Start pre switchover steps - T230783
  • 03:52 chaomodus: rebooted netboxdb[12]001 for kernel upgrade
  • 03:46 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:45 crusnov@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:43 mutante: db2060 - remove PXE flag boot override - set Boot Device to none

2019-09-23

  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:50 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:50 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:43 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:32 catrope@deploy1001: Synchronized wmf-config/VariantSettings.php: Syncing no-op change for T232419 (duration: 00m 57s)
  • 19:57 cdanis: T233657 ✔️ cdanis@cp4027.ulsfo.wmnet ~ 🕓🍵 sudo -i depool
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 2a7a125: Redefine hiwikisource extra namespaces (T233365) (duration: 00m 57s)
  • 19:09 Urbanecm: Going to deploy one more last-time patch
  • 18:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 (T233610) (duration: 00m 56s)
  • 18:48 Urbanecm: Morning SWAT done
  • 18:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 37fcbdf: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s)
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: be2f9d4: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 55s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: d397f5f: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 56s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8f3f070: Disallow indexing discussion and user pages on eswiki (T233562) (duration: 00m 56s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6cb2042: New throttle rule for Wikimedia Chile editathon (T233378) (duration: 00m 56s)
  • 18:13 Urbanecm: Security deploy for T207094
  • 18:03 gilles: T233095 Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose
  • 17:59 gilles@deploy1001: Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 00m 56s)
  • 17:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config (T233610) (duration: 00m 58s)
  • 16:53 elukey@deploy1001: Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s)
  • 16:46 elukey@deploy1001: Started deploy [analytics/refinery@b99647e]: (no justification provided)
  • 16:33 Urbanecm: Remove my temporary adminship on bgwikinews (T233322)
  • 16:29 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 2/2) (duration: 00m 56s)
  • 16:27 urbanecm@deploy1001: Synchronized dblists/closed.dblist: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 1/2) (duration: 00m 58s)
  • 16:26 Urbanecm: mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary (T233322)
  • 13:21 moritzm: installing qemu security updates on remaining cloudvirt hosts
  • 12:40 moritzm: rolling restart of graphoid on scb to pick up expat security update
  • 12:05 moritzm: restarting apache on bast5001 to pick up expat security update
  • 11:50 moritzm: restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update
  • 11:42 vgutierrez: switching cp4027 from nginx to ats-tls - T231627
  • 11:35 moritzm: installing expat security updates
  • 11:33 awight: EU SWAT finished
  • 11:31 awight@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: Add change tags to all FileImport text revisions (T227849) (duration: 00m 57s)
  • 11:23 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Set item terms on write both up to Q40Mio (T225055) (duration: 00m 55s)
  • 11:12 effie: Disable puppet and rolling restart of php7.2-fpm on mw[1321-1333] - T219150
  • 11:11 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 56s)
  • 11:06 awight@deploy1001: Synchronized static/images/project-logos: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 57s)
  • 11:05 moritzm: uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) T233604
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:51 jynus: stopping db2102 mariadb to recover db
  • 09:45 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'نعنوعه' 'مريانا_علي' (T233585)
  • 09:44 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' (T233585)
  • 09:38 akosiaris: T218184 upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1
  • 09:02 effie: Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - T219150
  • 08:31 elukey@deploy1001: Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s)
  • 08:24 elukey@deploy1001: Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json
  • 07:41 godog: swift run swiftrepl without deletes eqiad -> codfw
  • 07:40 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json
  • 07:08 marostegui: Stop MySQL on db1123 to reboot to change binlog format and kernel - T230783
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to change binlog format T230783', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1066 from config T233071 (duration: 00m 56s)
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1066 from config T233071 (duration: 01m 15s)

2019-09-22

  • off: marostegui set s3 master RW

2019-09-21

  • 05:42 shdubsh: re-enable input-kafka-rsyslog-shipper in codfw
  • 05:33 shdubsh: drop input-kafka-rsyslog-shipper in codfw
  • 02:15 bblack: dbproxy1017: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 02:14 bblack: dbproxy1016: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 01:52 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash2004-5-6
  • 01:34 mutante: restarting mobileapps service on scb*
  • 01:34 mutante: restarted mobileapps service on scb1001
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 01:21 bblack: re-pooling cp108[78] in D2 via confctl
  • 01:14 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash1007
  • 01:08 shdubsh: removed input-kafka-rsyslog-shipper-eqiad/codfw from logstash inputs logstash1008 and logstash1009
  • 00:54 mutante: aqs1009 - systemctl restart aqs
  • 00:54 mutante: aqs1006 - systemctl restart aqs
  • 00:48 mutante: aqs1005 - systemctl restart aqs
  • 00:46 shdubsh: restarting logstash on logstash1008 without udp-localhost-eqiad/codfw configs
  • 00:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1088.eqiad.wmnet
  • 00:38 bblack: depooling confctl things in rack D2
  • 00:38 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2019-09-20

  • 21:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: fix T233453 (duration: 00m 56s)
  • 21:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: fix T233453 (duration: 00m 58s)
  • 19:26 XioNoX: update eqsin firewall filters - T233268
  • 16:35 krinkle@deploy1001: Synchronized vendor/: ead70240892e9 (duration: 00m 59s)
  • 16:14 XioNoX: update eqiad firewall filters - T233268
  • 16:11 XioNoX: update esams firewall filters - T233268
  • 15:17 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bgwiki --logwiki=metawiki 'Newrdkter' 'NRdk' (T233313)
  • 15:03 XioNoX: remove AS-PATH prepending in ams
  • 11:29 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:16 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:15 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 09:31 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:52 jynus: creating new database on m1 "bacula9" T229209
  • 08:28 hashar: Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390
  • 08:23 hashar: CI in default since it is somehow no more able to fetch from Gerrit T233390
  • 08:20 hashar: contint1001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 08:12 hashar: contint2001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:46 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:45 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:14 godog: eqiad-prod: start ms-be1027 decom - T233289
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic T223151', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json
  • 05:27 marostegui: Analyze table enwiki.logging on db2102 - T223151
  • 05:07 marostegui: Remove temporary index on hiwikisource views T219374
  • 01:06 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC (duration: 02m 51s)
  • 01:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/TimedMediaHandler/: T233360 Fix Safari 13.0 regression in video playback with audio (duration: 00m 58s)
  • 01:03 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC

2019-09-19

  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 ejegg: updated payments-wiki from adef0e858f to 939b771800
  • 22:34 mutante: gerrit1001 - stopping puppet, removing gerrit IP from interface, rebooting
  • 21:37 niharika29@deploy1001: Synchronized wmf-config/VariantSettings.php: Enable special:mute on testwiki; T231577 (duration: 00m 56s)
  • 20:15 XioNoX: push firewall policies to pfw3-eqiad - T233325
  • 20:07 XioNoX: push firewall policies to pfw3-codfw - T233325
  • 19:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.23 refs T220748
  • 19:02 twentyafterfour: There are currently no blockers for T220748 so I am preparing to deploy 1.34.0-wmf.23 to all wikis.
  • 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 18:14 XioNoX: add TCP-MSS 1436 to cr2-eqiad external interfaces - T232602
  • 18:12 XioNoX: add TCP-MSS 1436 to cr1-eqiad external interfaces - T232602
  • 18:01 bblack: lvs2004 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:55 mutante: puppetmaster1001 - add mcrouter cert for mw1298.eqiad.wmnet (T192457)
  • 17:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 17:48 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, 32cf50453cd (duration: 01m 04s)
  • 17:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2 (duration: 08m 52s)
  • 17:43 Krinkle: Move whisper/MediaWiki/wanobjectcache/revision_row_1/29 to whisper/MediaWiki/wanobjectcache/revision_row_1_29 on graphite1004 and graphite2003 (T232907)
  • 17:38 arlolra@deploy1001: Started deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2
  • 17:27 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:27 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/includes/libs/objectcache/wancache: 2e910c9, T232907 (duration: 01m 03s)
  • 17:23 bblack: lvs2005 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:19 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:16 bblack: lvs200[456] - puppet disabled for https://gerrit.wikimedia.org/r/536324 deploy/test
  • 17:14 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062 (duration: 05m 42s)
  • 17:08 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062
  • 16:31 _joe_: removed manually the purge_checkuser cron from mwmaint1002, to have puppet recreate it
  • 16:20 ejegg: updated fundraising CiviCRM from 90db6cb5a1 to 5def62ab05
  • 16:15 papaul: shutting down scs-a1-codfw for replacement
  • 15:26 moritzm: repooling restbase2012 after completed Cassandra bootstrap T224553
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=restbase,service=cassandra,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-backend,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-ssl,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 15:05 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:56 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286) (duration: 05m 39s)
  • 14:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286)
  • 14:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3 (duration: 10m 42s)
  • 14:37 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3
  • 14:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2 (duration: 08m 24s)
  • 14:31 mobrovac: bootstrap restbase2012-c -- T224553
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2
  • 14:28 mobrovac@deploy1001: deploy aborted: Remove the TID suffix in the ETag, if present - T230272 (duration: 11m 20s)
  • 14:28 sbassett: Deployed security patch for T224203 (php-1.34.0-wmf.23)
  • 14:19 sbassett: Deployed security patch for T224203
  • 14:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 14:18 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:17 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present - T230272
  • 13:54 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750) (duration: 03m 06s)
  • 13:51 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750)
  • 13:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/Translate: T233308 (duration: 01m 07s)
  • 13:14 moritzm: powercycling mw1300
  • 13:12 mobrovac: bootstrap restbase2012-b -- T224553
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1089 into contributions service T223151', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json
  • 13:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553 (duration: 21m 38s)
  • 12:39 mobrovac@deploy1001: Started deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553
  • 12:36 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:48 mobrovac: bootstrap restbase2012-a -- T224553
  • 11:32 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 199a05c: Add new throttle rule for Czech wiki course (T233199) (duration: 01m 01s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: eab7c6a: c80f026: GrowthExperiments: GrowthExperiments: Enable Special:Homepage for euwiki, GrowthExperiments: Enable help panel for euwiki (T233066, T233065) (duration: 01m 05s)
  • 09:54 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: security T207094 (duration: 01m 02s)
  • 09:53 urbanecm@deploy1001: sync-file aborted: security T207094 (duration: 00m 28s)
  • 09:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: security T207094 (duration: 01m 05s)
  • 09:22 godog: power back on ms-be1027, found with power off
  • 08:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 393441b: Change configuration of AbuseFilter extension for enwikisource (T231750) (duration: 01m 04s)
  • 08:22 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: revert T207094 (duration: 01m 04s)
  • 08:20 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: security T207094 (duration: 01m 06s)
  • 08:11 marostegui: Rename tables on db1133:labspuppet T233281
  • 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:40 moritzm: rebooting failoid1001 for kernel update
  • 07:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give more logpager weight to db1089 T223151', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json
  • 07:01 moritzm: reimaging restbase2012 to stretch T224553
  • 06:18 marostegui: Sanitize hiwikisource on db1124:3313 and db2094:3313 T219374
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager T223151', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json
  • 05:11 marostegui: Stop MySQL on db2055 for decommission T233186
  • 05:11 marostegui: Remove db2055 from tendril and zarcillo T233186

2019-09-18

  • 23:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MobileFrontend/resources/dist/: T233260, 1667ed9 (duration: 01m 04s)
  • 22:58 cmjohnson1: enabled asw2-c-eqiad interface xe-2/0/45
  • 22:40 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/resources/Resources.php: d6dadfd (duration: 01m 03s)
  • 22:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, ff44043efa59e9 (duration: 01m 05s)
  • 22:13 cmjohnson1: disabling asw2-c-eqiad xe-2/0/45 - cr1-eqiad to replace optic T233265
  • 21:54 gilles: T233095 Purging all eswiki articles (both desktop and mobile this time)
  • 21:53 gilles@deploy1001: Synchronized php-1.34.0-wmf.22/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 01m 04s)
  • 21:13 XioNoX: enable damping on primary codfw-eqiad link - T196432
  • 21:09 XioNoX: enable damping on codfw-ulsfo link - T196432
  • 20:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No longer load InitialiseSettings at all in CommonSettings (duration: 01m 03s)
  • 20:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Quick fix for wmfLoadInitialiseSettings() (duration: 01m 03s)
  • 20:40 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 20:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out call to InitialiseSettings.php (duration: 01m 04s)
  • 20:18 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Drop suport for serialised PHP (duration: 01m 04s)
  • 20:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP T223602 (duration: 01m 04s)
  • 20:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:07 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208246 Enforce a 10-byte password for privileged users (duration: 01m 04s)
  • 19:57 urandom: decommissioning Cassandra, restbase2012-c -- T224553
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:42 gilles: T233095 Purging all pages on eswiki
  • 19:27 joal@deploy1001: Finished deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix (duration: 03m 40s)
  • 19:24 mutante: ganeti1001 - deleting krypton.eqiad.wmnet - decom T231546
  • 19:23 joal@deploy1001: Started deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.23 refs T220748 (duration: 01m 04s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.23 refs T220748
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:07 twentyafterfour: There appear to be no blockers on T220748 so I'll proceed with deploying 1.34.0-wmf.23 to group 1.
  • 19:01 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix (duration: 02m 12s)
  • 18:59 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix
  • 18:55 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train (duration: 01m 05s)
  • 18:54 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train
  • 18:46 XioNoX: remove `border-in4 term ddos-0906` from all routers
  • 17:53 Amir1: Creating hiwikisource is done
  • 17:50 urandom: decommissioning Cassandra, restbase2012-b -- T224553
  • 17:48 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 32s)
  • 17:45 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Add hiwikisource logos (T218155) (duration: 01m 04s)
  • 17:43 ladsgroup@deploy1001: Synchronized wmf-config/VariantSettings.php: Add hiwikisource (T218155) (duration: 01m 05s)
  • 17:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hiwikisource (T218155) (duration: 01m 04s)
  • 17:38 Amir1: manual write on hiwikisource "wikiadmin@10.64.0.205(hiwikisource)> update text set old_text = 'DB://cluster25/1';" (T218155)
  • 17:33 Amir1: mwscript maintenance/createAndPromote.php --wiki=hiwikisource --force --sysop Ladsgroup (T218155)
  • 17:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:22 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 06s)
  • 17:22 Jeff_Green: authdns-update to deploy DNS for new fundraising host
  • 17:03 mutante: ganeti2004 - resetting DRAC in an attempt to make IPMI work again
  • 17:00 Urbanecm: Morning SWAT done
  • 16:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable DNS blacklist on testwiki temporarily (T230822) (duration: 01m 03s)
  • 16:43 Urbanecm: 8340be9 sync is for T230822, mistakenly inserted `test` instead of the task number
  • 16:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8340be9: Enable logging for BlockManager channel at info level (test) (duration: 01m 04s)
  • 16:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: dc1298d: Add Draft and Draft_talk aliases for wikis that define draft namespace (T223472) (duration: 01m 02s)
  • 16:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 6e59651: Disable FundraiserLandingPage extension on test.wikipedia.org (T203020) (duration: 01m 04s)
  • 16:26 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/tewikisource.png (T232065)
  • 16:25 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 2/2) (duration: 01m 06s)
  • 16:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 1/2) (duration: 01m 05s)
  • 16:18 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 817d679: Turn on EventLogging at 100% for DonateWiki (T233145) (duration: 01m 04s)
  • 16:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: ba30276: Add suppressredirect right to filemovers on bnwiki (T233137) (duration: 01m 05s)
  • 15:55 moritzm: repooling restbase2011 after reimage/bootstrap
  • 15:53 urandom: decommissioning Cassandra, restbase2012-a -- T224553
  • 15:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:59 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-backend
  • 14:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 joal@deploy1001: Finished deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train (duration: 05m 28s)
  • 13:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 joal@deploy1001: Started deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 hashar: Restarting Jenkins, starting Zuul
  • 12:56 marostegui: Deploy schema change on the following s6 hosts: db1088, db1093, db1096, db1098, db1139, dbstore1005 - T231172
  • 12:52 hashar: gracefully stopping Zuul (kill SIGUSR1) to prepare for Jenkins restart
  • 12:40 marostegui: Deploy schema change on s6 codfw master with replication T231172
  • 12:18 vgutierrez: restarting ats-tls to avoid spreading Proxy-Connection header - T233205
  • 12:03 marostegui: Stop haproxy on dbproxy1006 - T233207
  • 11:29 mobrovac: bootstrap restbase2011-c -- T224553
  • 11:27 awight: EU SWAT complete
  • 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 00m 59s)
  • 11:25 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: NowCommons test & test2wiki configuration (T228851) (duration: 01m 15s)
  • 10:17 onimisionipe: force relocation of shards for eqiad search(chi) cluster
  • 10:16 moritzm: restarting postgres on puppetdb1002/2002 after updating permissions for replication user
  • 10:00 mobrovac: bootstrap restbase2011-b -- T224553
  • 09:37 godog: run swiftrepl eqiad -> codfw on all containers, no deletes
  • 09:37 effie: upgrading netmon* to PHP 7.2.22 T230024
  • 09:35 godog: run swiftrepl eqiad -> codfw for transcoded containers
  • 08:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9125 and previous config saved to /var/cache/conftool/dbconfig/20190918-085721-marostegui.json
  • 08:22 mobrovac: bootstrap restbase2011-a -- T224553
  • 07:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 07:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:43 moritzm: reimaging restbase2011 to stretch T224553
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P9124 and previous config saved to /var/cache/conftool/dbconfig/20190918-060401-marostegui.json
  • 05:58 marostegui: Deploy schema change on db2097:3316 - T233135
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool host after onsite checks T233184', diff saved to https://phabricator.wikimedia.org/P9123 and previous config saved to /var/cache/conftool/dbconfig/20190918-054755-marostegui.json
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2055 from config T233186 (duration: 01m 04s)
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2055 from config T233186 (duration: 01m 06s)
  • 05:03 marostegui: Start MySQL on db2127 T233184
  • 03:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.util/: 0333729e, ccfe88241 (duration: 01m 07s)

2019-09-17

  • 23:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.23 refs T220748
  • 23:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/VisualEditor/extension.json: aae62a8 (duration: 01m 05s)
  • 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 22:43 dzahn@cumin1001: Updating IPMI password on 6 hosts - dzahn@cumin1001
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add comment about MinimumPasswordLengthToLogin (duration: 01m 03s)
  • 21:45 cstone: civicrm revision changed from 45dbfdb96f to 90db6cb5a1
  • 21:45 tzatziki: removed one file for legal compliance
  • 21:12 XioNoX: delete AS13335 91.198.174.0/24 RPKI/ROA
  • 21:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 21:10 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 21:10 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:08 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:07 twentyafterfour@deploy1001: Finished scap: testwikis to 1.34.0-wmf.23 refs T220748 (duration: 24m 55s)
  • 21:01 XioNoX: enable interface damping on primary eqiad-esams link (eqiad side) - T196432
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:47 dzahn@cumin1001: Updating IPMI password on 660 hosts - dzahn@cumin1001
  • 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:42 twentyafterfour@deploy1001: Started scap: testwikis to 1.34.0-wmf.23 refs T220748
  • 20:39 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:31 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/phpCharToUpper.json: 8372dcd (duration: 00m 56s)
  • 20:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/Title.js: 8372dcd (duration: 02m 08s)
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 21 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 tzatziki: changing email for User:Olag
  • 20:12 dzahn@cumin1001: Updating IPMI password on 18 hosts - dzahn@cumin1001
  • 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:04 dzahn@cumin1001: Updating IPMI password on 29 hosts - dzahn@cumin1001
  • 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:32 ejegg: updated payments-wiki from fc82318180 to adef0e858f
  • 19:26 dzahn@cumin1001: Updating IPMI password on 543 hosts - dzahn@cumin1001
  • 19:25 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:22 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:20 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:14 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:08 twentyafterfour: Branch cut is in progress for 1.34.0-wmf.23
  • 19:05 urandom: decommissioning Cassandra, restbase2011-c -- T224553
  • 18:06 papaul: upgrading firmware on scs1-a1-codfw
  • 17:18 ejegg: updated SmashPig payments listener from a0151434f4 to dc0c6b208b
  • 17:09 urandom: decommissioning Cassandra, restbase2011-b -- T224553
  • 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 17:00 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 16:04 jbond42: run octocatalog-diff from elnath with current facts
  • 15:55 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 55s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 15:39 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:38 urandom: decommissioning Cassandra, restbase2011-a -- T224553
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Host down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9120 and previous config saved to /var/cache/conftool/dbconfig/20190917-151714-marostegui.json
  • 15:16 marostegui: Stop MySQL on db2127 and shut the host down for onsite maintenance
  • 14:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 14:52 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on wikitech for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 8 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 7 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 6 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 5 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 4 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on remaining section 3 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 2 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 1 wikis for T232464
  • 14:48 anomie@mwmaint1002: Running cleanupRevActorPage.php on test wikis and mediawikiwiki for T232464
  • 14:39 anomie@deploy1001: Synchronized php-1.34.0-wmf.22/includes/MergeHistory.php: Backport MergeHistory fix for T232464 gerrit:537436 (duration: 00m 54s)
  • 14:35 ottomata: bouncing eventstreams service on scb hosts
  • 14:15 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 14:03 herron: migrating kafka1003 to kafka-main1003 T225005
  • 14:00 jbond42: forcing puppet run
  • 14:00 bblack: lvs1015 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:59 bblack: lvs2003 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:57 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:52 bblack: lvs1016 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:52 bblack: lvs2006 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:45 moritzm: repooling restbase2010 after reimage/completed bootstrap
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 db1104 db1085 db1086 after PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9117 and previous config saved to /var/cache/conftool/dbconfig/20190917-132102-marostegui.json
  • 13:17 godog: force-run puppet in eqiad to update exported resources
  • 13:14 jbond42: currently running octocatalog-diff for all hosts from elnath
  • 13:02 marostegui: Start replication on db1130 db1104 db1085 db1086 after PDU maintenance is completed - T227539
  • 13:01 cmjohnson1: The PDU swap in rack B3 eqiad is finished.
  • 12:30 mobrovac: bootstrap restbase2010-c - T224553
  • 11:32 Urbanecm: EU SWAT is done
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:31 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 290e207: Add channels for the Translate and TranslationsNotification extension (T221119, T144780, T143073) (duration: 00m 56s)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:29 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:27 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Use https rather than protcol-relative remote API URLs (T228851) (duration: 00m 58s)
  • 11:24 cmjohnson1: commencing pdu swap rack b3 eqiad T227539
  • 11:22 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Update ORES filter threshold configuration for new huwiki model (T230031) (duration: 00m 55s)
  • 11:17 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable EditorJourney for euwiki (T232061) (duration: 00m 56s)
  • 11:13 Urbanecm: Run mwscript emptyUserGroup.php --wiki=aawiki 'inactive' (T150538)
  • 10:58 mobrovac: bootstrap restbase2010-b - T224553
  • 10:44 vgutierrez: replacing nginx with ATS in cp1076 (upload cluster) - T231433
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9116 and previous config saved to /var/cache/conftool/dbconfig/20190917-094827-marostegui.json
  • 09:46 marostegui: Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539
  • 09:30 hashar: Restarting CI jenkins
  • 09:29 marostegui: Downtime db1073 db1130 db1104 db1085 db1086 for the PDU maintenance T227539
  • 09:18 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 mobrovac: bootstrap restbase2010-a - T224553
  • 09:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 100% of users who accept cookies - T219150 (duration: 00m 57s)
  • 08:37 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp3034 - T231849 T232724
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1074 with just 50 to keep its warmness level just in case T231638', diff saved to https://phabricator.wikimedia.org/P9115 and previous config saved to /var/cache/conftool/dbconfig/20190917-075807-marostegui.json
  • 07:48 effie: Enable puppet on mw*
  • 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates
  • 07:41 marostegui: Stop mysql on db1063 for decommissioning T232564
  • 07:40 marostegui: Remove db1063 from puppet and zarcillo T232564
  • 07:29 vgutierrez: repooling cp5007 without wikibase configuration - T99531
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 vgutierrez: depooling cp5007 to ensure that wikibase removal goes as expected - T99531
  • 07:10 vgutierrez: getting rid of wikibase TLS certificate & nginx configuration on the text cache cluster - T99531
  • 06:56 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp2002, cp4021 and cp5001 - T231849
  • 06:55 vgutierrez: uploaded trafficserver 8.0.5-1wm8 to apt.wikimedia.org (stretch) - T231849
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1066 T233071', diff saved to https://phabricator.wikimedia.org/P9114 and previous config saved to /var/cache/conftool/dbconfig/20190917-065342-marostegui.json
  • 06:49 moritzm: reimage restbase2010 to Stretch T224553
  • 05:57 vgutierrez: upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - T232724
  • 05:56 vgutierrez: uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - T232298 T232724
  • 05:23 effie: disable puppet on mw* servers for 536979
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 master and remove read-only from s2 T230785', diff saved to https://phabricator.wikimedia.org/P9113 and previous config saved to /var/cache/conftool/dbconfig/20190917-050133-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-only for maintenance T230785', diff saved to https://phabricator.wikimedia.org/P9112 and previous config saved to /var/cache/conftool/dbconfig/20190917-050043-marostegui.json
  • 05:00 marostegui: Starting s2 failover from db1066 to db1122 - T230785
  • 04:57 effie: Downtiming HTTPS-blog on icing - T232412
  • 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 and depool it from API T230785', diff saved to https://phabricator.wikimedia.org/P9111 and previous config saved to /var/cache/conftool/dbconfig/20190917-041441-marostegui.json
  • 04:11 marostegui: Start s2 pre-switchover steps T230785
  • 00:34 AndyRussG: updated fruec from fb29cb7407 to 97128874bf

2019-09-16

  • 23:53 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgDebugLogFile in VS (duration: 00m 55s)
  • 23:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgDebugLogFile in CS (duration: 00m 55s)
  • 23:42 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgUploadThumbnailRenderHttpCustom* in VS (duration: 00m 54s)
  • 23:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgUploadThumbnailRenderHttpCustom* in CS (duration: 00m 55s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wmgRC2UDPAddress in VS (duration: 00m 55s)
  • 23:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgRC2UDPAddress in CS (duration: 00m 56s)
  • 23:24 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgCopyUploadProxy in VS (duration: 00m 56s)
  • 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgCopyUploadProxy in CS (duration: 00m 55s)
  • 23:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T225261 T194019 Adjust CentralNotice CSP for banner previews for FR-tech (duration: 00m 55s)
  • 22:59 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use __DIR__ rather than global wmfConfgDir (duration: 00m 55s)
  • 21:48 ebernhardson: unban elastic1027 from production-search-eqiad
  • 20:55 XioNoX: remove 2 sessions to AS12871 on cr2-esams - T232617
  • 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:20 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:10 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:08 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:55 XioNoX: reboot scs-a8-eqiad (at 100% CPU)
  • 19:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:55 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:53 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:51 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:35 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:28 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:27 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:19 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:13 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:13 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:09 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:03 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgCookieSetOnAutoBlock and wgCookieSetOnIpBlock to the default; never varied (duration: 00m 56s)
  • 19:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up globals in InitialiseSettings.php (duration: 00m 56s)
  • 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:01 dzahn@cumin1001: Updating IPMI password on 0 hosts - dzahn@cumin1001
  • 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:54 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 18:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Variant configuration: Read JSON config for all wikis (duration: 00m 56s)
  • 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 56s)
  • 18:40 jforrester@deploy1001: Synchronized src/WmfClusters.php: Use static VariantSettings instead of InitialiseSettings (noc-only change) (duration: 00m 55s)
  • 18:40 mutante: phab1001 - racadm racreset
  • 18:21 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Remove globals declaration and use via GLOBALS for testability (duration: 00m 56s)
  • 18:15 Lucas_WMDE: Morning SWAT done
  • 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: bridge: enable EditTags for beta (T232582) (duration: 00m 58s)
  • 18:12 herron: migrating kafka1002 to kafka-main1002 T225005
  • 18:09 mutante: registry2001 - restarting nginx
  • 17:55 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 55s)
  • 17:49 ejegg: updated SmashPig standalone from 5d187092a7 to a0151434f4
  • 17:42 urandom: decommissioning Cassandra, restbase2010-c -- T224553
  • 17:42 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027 due to >1k orphan tasks
  • 17:09 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 54s)
  • 16:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make CommonSettings use mtime from VariantSettings (duration: 00m 55s)
  • 16:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make InitialiseSettings use values from VariantSettings (duration: 00m 54s)
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Establish VariantSettings.php everywhere (duration: 00m 56s)
  • 16:51 ebernhardson: ban elastic1027 from production-search-eqiad-chi
  • 16:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223602 Inject config object into InitialiseSettings-labs rather than use wgConf global (duration: 00m 55s)
  • 15:42 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 56s)
  • 15:41 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 08s)
  • 15:41 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602
  • 15:10 @: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 15:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:54 urandom: decommissioning Cassandra, restbase2010-b -- T224553
  • 14:37 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:25 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:09 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 13:28 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FlaggedRevs/frontend/specialpages/reports/ValidationStatistics.php: Add missing "use" to getTopReviewers() - T232618 (duration: 00m 55s)
  • 13:10 moritzm: rebooting failoid2001 for kernel update/pick up new qemu
  • 13:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.22
  • 12:59 moritzm: installing qemu security updates on stretch
  • 12:58 urandom: decommissioning Cassandra, restbase2010-a -- T224553
  • 12:44 godog: stop thumbor traffic to statsd/graphite, use Prometheus only and replace Thumbor dashboard - T205870
  • 12:40 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 12:17 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:07 _joe_: rolling restart ended on eqiad T232613
  • 11:56 _joe_: rolling restart of php-fpm in eqiad to pick up the new memcached extension T232613
  • 11:50 _joe_: rolling restart of php-fpm in codfw to pick up the new memcached extension T232613
  • 11:43 Urbanecm: EU SWAT is done
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: e37aed2: Remove expired throttle rules (duration: 01m 03s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 313e3d9: Increase move rate-limit on Commons for all autopatrolled users (T232657) (duration: 01m 05s)
  • 11:33 jbond42: update peer address of AS28598
  • 11:30 effie: Upgrading php-memcached to 3.0.1+2.2.0-1~wmf3
  • 11:30 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Send a User-Agent with remote API requests (T232840) (duration: 01m 02s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 869b56f: Lift IP cap on 2019-10-02 for Senior Citizen Write Wikipedia course - cs.wikipedia (T232831) (duration: 01m 02s)
  • 11:21 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable File Importer source wiki edits on beta cluster (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable source wiki editing for testwiki (T228851) (duration: 01m 02s)
  • 11:10 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Add debug logging for remote API failures (T228851) (duration: 01m 05s)
  • 11:06 _joe_: uploaded php-memcached_3.0.1+2.2.0-1~wmf3 to component/php72 for stretch T232613
  • 10:52 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 10:51 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 10:50 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 10:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 10:45 vgutierrez: Enabling OCSP prefetched responses for the non-canonical redirect service - T232988
  • 10:29 _joe_: installing a patched php-memcached on mw1347 T232613
  • 10:16 vgutierrez: upgrade acme-chief production servers to acme-chief 0.21 - T219765
  • 10:16 moritzm: upload libtrapperkeeper-webserver-jetty9-clojure 1.7.0-2+wmf1 to buster-wikimedia
  • 10:05 vgutierrez: restarting acmechief servers to get latest kernel upgrades
  • 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 vgutierrez: replacing nginx with ATS in cp3034 (upload cluster) - T231433
  • 08:56 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Beta: enable the Parsoid extension - T231569 (duration: 01m 01s)
  • 08:50 marostegui: Apply grants for dbproxy1021 on db1133 (m5 master) with replication - T202367
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 moritzm: installing faad2 security updates
  • 07:15 moritzm: repooling restbase2009
  • 06:48 marostegui: Stop MySQL on db1114 to upgrade it to 10.3
  • 06:04 marostegui: Stop MySQL on db2054 for decommissioning T232969
  • 06:01 marostegui: Remove db2054 from tendril and zarcillo T232969
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2054 from config T232969 (duration: 01m 03s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2054 from config T232969 (duration: 01m 05s)

2019-09-15

  • 16:51 Krinkle: Fixed a dozen abuse filters, listed at https://phabricator.wikimedia.org/T156096#5494060. The trailing pipe character was removed from filters that had it which is no longer supported in a future version of AbuseFilter.
  • 14:35 _joe_: test: setting opcache.interned_strings_buffer to 0 on mw1348 for T232613

2019-09-14

  • 23:42 onimisionipe: force shard allocation (dewiki_content_1566659363[4]) on eqiad cluster
  • 04:39 effie: Depool and reload mw1286
  • 01:14 ejegg: updated fundraising python tools from 1e405864d7 to e1b81688c6
  • 00:29 ejegg: updated payments-wiki from 1f556670cf to fc82318180

2019-09-13

  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 gehel: re-enable puppet on maps - T232817
  • 20:23 chaomodus: restarting netbox1001.wikimedia.org
  • 20:00 twentyafterfour: hotfixing T232600 due to severity of the bug and relative safety of the fix (if this breaks, yell at James_F who twisted my arm and made me do it)
  • 19:54 urandom: bootstrapping Cassandra, restbase2009-c -- T224553
  • 17:24 urandom: bootstrapping Cassandra, restbase2009-b -- T224553
  • 16:10 XioNoX: fix bgp group netflow on cr2-codfw
  • 15:47 urandom: bootstrapping Cassandra, restbase2009-a -- T224553
  • 15:43 effie: reverting live hacks on mw1348
  • 15:34 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable adhoc core dump logging - T232613 (duration: 01m 04s)
  • 15:14 akosiaris: upload apertium-dan_0.6.0-1+wmf3 apertium-nno_1.0.0-1+wmf1 apertium-nob_1.0.0-2+wmf1 apertium-swe_0.8.0-1+wmf1 to apt.wikimedia.org/jessie-wikimedia T218184
  • 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:02 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Add more log and context for T232613 logging - T232613 (duration: 01m 04s)
  • 15:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 moritzm: installing cups security update on buster (only client-side libs installed)
  • 14:22 moritzm: installing bzip2 update from Buster 10.1 point release
  • 14:18 moritzm: installing reportbug update from Buster 10.1 point release
  • 14:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:05 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:57 oblivian@deploy1001: Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s)
  • 13:28 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:27 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:21 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:20 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:19 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:56 _joe_: banning more urls on maps1003
  • 12:37 _joe_: temp ban of class of urls on maps1003 nginx
  • 12:14 jbond42: add timing information to maps1003 access logs
  • 11:39 jbond42: enable access logs on maps1003
  • 11:38 _joe_: manually raising the worker heap limit to 600 MB on kartotherian on maps1003
  • 11:11 elukey: reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades
  • 11:10 elukey: reboot an-tool1007 (runs turnilo) for kernel upgrades
  • 11:08 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 godog: silence kartotherian pages for 2h, known issue
  • 10:47 vgutierrez: rebooting acmechief-test servers to catch up latest kernel upgrades
  • 10:42 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:41 moritzm: reimage restbase2009 to stretch T224553
  • 10:38 moritzm: repool restbase1018 after reimage to stretch and completed Cassandra bootstrap
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:13 vgutierrez: disable ATS-TLS debug options on cp5001 - T232298
  • 10:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 09:46 gehel: re-enabling /geoline on maps1004 - T232817
  • 09:45 @: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:44 @: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:40 godog: install linux-perf-4.9 on maps1002 and attempt to capture a stack sample
  • 09:38 gehel: drop /geoshape and restart kartotherian on maps1004 - T232817
  • 09:27 gehel: restart kartotherian on maps1004 - T232817
  • 09:24 gehel: deny access to /geoline on maps1004 - T232817
  • 09:11 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 09:08 godog: downtime kartotherian pages for 1h in codfw
  • 09:01 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet
  • 09:00 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet
  • 08:57 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:52 godog: downtime kartotherian pages for 1h
  • 08:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 08:48 jmm@cumin2001: Updating IPMI password on 1 hosts - jmm@cumin2001
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:47 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:45 gehel: stop tilerator on maps to help reduce load
  • 08:37 _joe_: rolling restart of karotherian
  • 08:33 _joe_: restarting kartotherian on maps1003, all workers seem stuck
  • 05:58 oblivian@deploy1001: Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s)
  • 05:40 _joe_: live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken
  • 05:17 effie: Rolling restart php-fpm across the fleet for 536400
  • 04:53 vgutierrez: restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849
  • 04:50 eileen: process-control config revision is 43a2677bcf - turned off gender import
  • 02:23 eileen: civicrm revision changed from c5ab5aea9e to 45dbfdb96f, config revision is 1da8391a9a
  • 01:09 XioNoX: add IPv6 sampling to cr1-eqiad
  • 01:07 XioNoX: enable netflow sampling on cr2-codfw

2019-09-12

  • 23:35 XioNoX: enable netflow sampling on cr1-codfw
  • 23:21 urandom: decommissioning Cassandra, restbase2009-b -- T224553
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Read config from JSON, not serialised PHP on testwiki (duration: 01m 03s)
  • 23:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: T223602 Add ability to read config from JSON, not serialised PHP (duration: 01m 04s)
  • 23:10 eileen: process-control config revision is 1da8391a9a
  • 22:53 ayounsi@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:43 ayounsi@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:43 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:20 XenoRyet: payments-wiki updated from 4ebbdb247d to 1f556670cf
  • 22:14 XioNoX: remove extra prepend in AMS-IX
  • 21:18 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Hardcode posix signal and log coredump - T232613 (duration: 01m 04s)
  • 21:17 mbsantos@deploy1001: Finished deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0 (duration: 03m 18s)
  • 21:14 mbsantos@deploy1001: Started deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0
  • 21:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0 (duration: 03m 52s)
  • 21:09 mbsantos@deploy1001: Started deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0
  • 21:00 urandom: decommissioning Cassandra, restbase2009 -- T224553
  • 20:33 krinkle@deploy1001: Synchronized wmf-config/: d495d5e24949 (duration: 01m 03s)
  • 20:28 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: d495d5e24949 (duration: 01m 04s)
  • 20:27 eileen: civicrm revision changed from 4075e396d5 to f00c6482bf, config revision is 635f198b92
  • 20:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only (duration: 01m 02s)
  • 20:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 04s)
  • 20:02 moritzm: installing firmware-nonfree update from Buster 10.1 point release
  • 19:51 moritzm: installing systemd bugfix update from Buster 10.1 point release
  • 19:44 moritzm: installing 4.19.67 kernel from 10.1 point release on Buster systems
  • 19:34 urandom: bootstrapping Cassandra, restbase1018-c -- T224553
  • 18:59 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable coredump on some mysterious php7.2 failure - T232613 (duration: 01m 04s)
  • 18:32 moritzm: installing gdb updates from buster 10.1 point release
  • 18:28 bblack: lvs1016: restart pybal to revert test
  • 18:21 bblack: lvs1016: restart pybal to test dual bgp peering
  • 18:04 bblack: lvs1015: restart pybal to return BGP session to cr2 - T226424
  • 18:03 bblack: lvs1014: restart pybal to return BGP session to cr2 - T226424
  • 17:58 XioNoX: revert VRRP priority change cr2-eqiad - T226424
  • 17:54 XioNoX: revert OSPF priority change on cr2-eqiad - T226424
  • 17:53 XioNoX: re-enabled external BGP on cr2-eqiad - T226424
  • 17:46 urandom: bootstrapping Cassandra, restbase1018-b -- T224553
  • 17:43 XioNoX: reboot cr2-eqiad - T226424
  • 17:40 XioNoX: failover cr2-eqiad master RE from RE1 to RE0 - T226424
  • 17:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: T232613 Add ability to core dump on empty string array key that should exist (wmf.22 only, flagged off) (duration: 01m 03s)
  • 17:31 XioNoX: power off re0.cr2-eqiad - T226424
  • 17:25 XioNoX: failover cr2-eqiad master RE from RE0 to RE1 - T226424
  • 17:19 halfak@deploy1001: Finished deploy [ores/deploy@7d45b80]: T232660 (duration: 13m 41s)
  • 17:05 halfak@deploy1001: Started deploy [ores/deploy@7d45b80]: T232660
  • 17:04 XioNoX: power off re1.cr2-eqiad - T226424
  • 17:02 moritzm: installing unzip security updates on buster
  • 17:00 XioNoX: +1000 metric to all transport to/from cr2-eqiad - T226424
  • 16:57 moritzm: installing libxslt security updates on buster
  • 16:49 XioNoX: Deactivate IX/transit/private-peer v4/v6 BGP on cr2-eqiad - T226424
  • 16:47 moritzm: installing NSS security updates on buster
  • 16:42 XioNoX: er, switch VRRP master to cr1-eqiad - T226424
  • 16:42 XioNoX: switch VRRP master to cr2-eqiad - T226424
  • 16:36 bblack: lvs1013: restart pybal to move bgp session to cr1 - T226424
  • 16:36 bblack: lvs1014: restart pybal to move bgp session to cr1 - T226424
  • 16:35 bblack: lvs1015: restart pybal to move bgp session to cr1 - T226424
  • 16:34 bblack: lvs1016: restart pybal to move bgp session to cr1 - T226424
  • 16:19 XioNoX: rollback force VRRP backup on cr1-eqiad - T226424
  • 16:16 XioNoX: activate CF tunnel on cr1-eqiad - T226424
  • 16:16 XioNoX: activate transit4/6 on cr1-eqiad - T226424
  • 16:09 urandom: bootstrapping Cassandra, restbase1018-a -- T224553
  • 16:04 XioNoX: reboot cr1-eqiad - T226424
  • 16:01 XioNoX: force offline/online of FPC3 on cr1-eqiad
  • 15:45 XioNoX: failover master RE from RE1 to RE0 on cr1-eqiad - T226424
  • 15:39 XioNoX: deactivate transit4/6 on cr1-eqiad - T226424
  • 15:31 XioNoX: shutdown re0.cr1-eqiad - T226424
  • 15:23 XioNoX: failover master RE from RE0 to RE1 on cr1-eqiad - T226424
  • 15:13 XioNoX: shutdown re1.cr1-eqiad - T226424
  • 15:05 XioNoX: disable primary tunnel to CF in eqiad (for real this time, I did see an uptake of traffic on backup link before the rollback)
  • 15:03 XioNoX: rolled back disable primary tunnel to CF in eqiad
  • 15:02 XioNoX: disable primary tunnel to CF in eqiad
  • 14:53 bblack: restart pybal on lvs1013 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:50 bblack: restart pybal on lvs1016 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:45 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:41 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:39 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:37 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:29 XioNoX: ensure cr1-eqiad is vrrp backup for all groups - T226424
  • 13:22 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:57 effie: restarting hhvm on mw1233 and repooling
  • 12:56 effie: depool mw12333
  • 12:38 moritzm: reimaging restbase1018 to stretch
  • 12:03 Amir1: EU SWAT is done
  • 12:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q20mio (T225055) (duration: 01m 31s)
  • 11:11 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:11 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:00 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:42 jynus: compressing tables on labsdb1012 T232446
  • 08:22 vgutierrez: upgrading to acme-chief 0.21 on acmechief-test instances - T219765
  • 08:17 vgutierrez: restarting pybal on lvs1015 and lvs2003 - T176875
  • 08:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wdqs,service=wdqs-heavy-queries
  • 08:11 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=puppetmaster1001.eqiad.wmnet,service=wdqs-heavy-queries
  • 08:07 vgutierrez: restarting pybal on lvs2006 - T176875
  • 08:02 vgutierrez: restarting pybal on lvs1016 - T176875
  • 07:45 vgutierrez: uploaded acme-chief 0.21 to apt.wikimedia.org (buster) - T219765
  • 06:51 vgutierrez: restarting ATS-TLS on cp4021 and cp2002 to get the new SSL session cache size - T232298
  • 06:00 marostegui: Stop MySQL on db1073 for decommission T231892
  • 05:59 marostegui: Remove db1073 from tendril and zarcillo T231892
  • 05:26 _joe_: restarting strongswan on all eqiad caches that need it
  • 05:23 _joe_: restarting strongswan on cp1077
  • 03:37 eileen: civicrm revision changed from 32cd5e4953 to 4075e396d5, config revision is 3e22a80bc8
  • 02:13 eileen: civicrm revision changed from 53aeba6318 to 32cd5e4953, config revision is 3e22a80bc8
  • 02:03 XioNoX: repooling ulsfo

2019-09-11

  • 23:50 ejegg: updated payments-wiki from 5432f9c3a4 to 4ebbdb247d
  • 23:20 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.197` on cr2-eqiad
  • 22:43 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.196` on cr1-eqiad
  • 22:36 XioNoX: add BGP session between cr2-eqord and netflow1001
  • 22:30 urandom: decommissioning Cassandra, restbase1018-c -- T224553
  • 20:57 urandom: bootstrapping Cassandra, restbase-dev1005-b -- T224554
  • 20:21 ottomata: stopped and removed eventlogging-service-eventbus - T232122
  • 20:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@522177f]: Clean up old event style support (duration: 01m 39s)
  • 20:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@522177f]: Clean up old event style support
  • 20:07 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049 (duration: 00m 53s)
  • 20:06 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049
  • 18:43 urandom: decommissioning Cassandra, restbase1018-b -- T224553
  • 18:42 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211124 ed8dd7aad9e5 (duration: 01m 04s)
  • 18:42 nuria@deploy1001: Finished deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect (duration: 08m 39s)
  • 18:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op ed8dd7aad9e5 (duration: 01m 06s)
  • 18:37 krinkle@deploy1001: Synchronized tests/: no-op ed8dd7aad9e5 (duration: 01m 05s)
  • 18:33 nuria@deploy1001: Started deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect
  • 18:16 krinkle@deploy1001: Synchronized wmf-config/logging.php: d6865e3365e8 - T211124 (duration: 01m 04s)
  • 18:16 nuria@deploy1001: Finished deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery (duration: 01m 21s)
  • 18:15 nuria@deploy1001: Started deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery
  • 18:02 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/WikimediaMaintenance/blameStartupRegistry.php: (no justification provided) (duration: 01m 05s)
  • 17:57 XioNoX: upgrade librenms to 1.55
  • 17:43 ayounsi@deploy1001: Finished deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599 (duration: 00m 09s)
  • 17:42 ayounsi@deploy1001: Started deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599
  • 17:32 bblack: enable GRE MTU mitigation on eqsin caches (cp5xxx) - T232602
  • 17:27 bblack: restbase2009 - re-pool - T227408
  • 17:07 bblack: restbase2009 - shutdown for hardware work - T227408
  • 17:05 bblack: restbase2009 - depool for hardware work - T227408
  • 16:57 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c0fd061: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 02s)
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka100[23]
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka-main1001
  • 16:50 bblack: manually removed decommed eventbus LVS IP on kafka-main200[23]
  • 16:49 bblack: manually removed decommed eventbus LVS IP on kafka-main2001
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6007fbc: [rowiki] Allow sysops to remove patrollers (T231099) (duration: 01m 03s)
  • 16:39 urandom: decommissioning Cassandra, restbase1018-a -- T224553
  • 16:38 Urbanecm: Run mwscript emptyUserGroup.php --wiki=fawiki OTRS-member (T232554)
  • 16:36 bblack: ran conftool-merge on puppetmaster1001 (manually from sudo -i, to fixup missing updates)
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 76991f2: Remove OTRS-member usergroup from fawiki (T232554) (duration: 01m 05s)
  • 16:32 Urbanecm: mwscript importImages.php --wiki=commonswiki --user=Abbe98 --comment-ext=txt /home/urbanecm/T232346
  • 16:31 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c45d6d0: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 03s)
  • 16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 565fafa: Set noindex for user and user_talk on zhwiki (T231982) (duration: 01m 05s)
  • 16:24 urandom: bootstrapping Cassandra, restbase-dev1005-a -- T224554
  • 16:16 bblack@cumin1001: conftool action : set/pooled=no; selector: cluster=eventbus
  • 16:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 510aa6b: Add new whitelist rule for Université de Lorraine course (T232596) (duration: 01m 04s)
  • 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: eceaccf: Add autopatrolled user group to az.wikibooks (T231493) (duration: 01m 06s)
  • 15:52 bblack: lvs1015 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:51 bblack: lvs2003 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:49 bblack: lvs1016 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:48 bblack: lvs2006 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:03 bblack: downtimed dns-discovery confd health checks for eventbus - T232122
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.22 (duration: 01m 02s)
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.22
  • 12:48 moritzm: upgrade labpuppetmaster* to use facter 3 / puppet 5
  • 12:40 moritzm: removing now obsolete puppet/puppetdb packages from labpuppetmaster* T171188
  • 12:40 moritzm: removing now puppet/puppetdb packages from labpuppetmaster* T171188
  • 11:59 hashar: Restarting Gerrit due to deadlock in the account cache # T224448
  • 11:57 bblack: applying GRE MTU -> MSS fixup to cobalt and gerrit2001 - T218184
  • 11:41 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.21/maintenance/getReplicaServer.php: SWAT: maintenance/getReplicaServer.php: Remove reference to long-deleted config var (T232268) (duration: 01m 04s)
  • 11:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AMC Outreach modal (T231436) (duration: 01m 04s)
  • 11:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q10mio (T225055) (duration: 01m 03s)
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: TR: set WikibaseTaintedReferencesEnabled true on labs wikidatawiki (T232191) (duration: 01m 03s)
  • 10:57 mobrovac: drop the wiktionary definition keyspace - T231361
  • 10:23 moritzm: removed roentgenium/tureis in Ganeti T224559
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:17 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:01 jynus: stopping and upgrading db1074
  • 09:56 jynus: upgrading mariadb client libary on mariadb root clients
  • 09:46 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 50% - T219150 (duration: 01m 03s)
  • 09:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a (duration: 12m 15s)
  • 09:32 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a
  • 09:32 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3 (duration: 13m 18s)
  • 09:19 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3
  • 09:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2 (duration: 03m 59s)
  • 09:13 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2
  • 09:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449 (duration: 03m 24s)
  • 09:08 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449
  • 08:36 mobrovac@deploy1001: Finished deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361 (duration: 02m 13s)
  • 08:34 mobrovac@deploy1001: Started deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361
  • 08:24 mobrovac@deploy1001: Finished deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition (duration: 00m 34s)
  • 08:24 mobrovac@deploy1001: Started deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition
  • 08:22 mobrovac@deploy1001: Finished deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361 (duration: 02m 45s)
  • 08:19 mobrovac@deploy1001: Started deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361
  • 08:13 elukey: add thirdparty/amd-rocm271 to buster-wikimedia and update it with ROCm 2.7.1 packages
  • 08:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:07 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm27 (not used anymore)
  • 08:07 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P9088 and previous config saved to /var/cache/conftool/dbconfig/20190911-080450-marostegui.json
  • 07:52 moritzm: reimaging restbase-dev1005 to Stretch T224554
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9087 and previous config saved to /var/cache/conftool/dbconfig/20190911-075139-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9086 and previous config saved to /var/cache/conftool/dbconfig/20190911-073335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9085 and previous config saved to /var/cache/conftool/dbconfig/20190911-072344-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9084 and previous config saved to /var/cache/conftool/dbconfig/20190911-071450-marostegui.json
  • 07:07 marostegui: Stop MySQL on db1122 to reboot for a kernel upgrade T230785
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to reboot for kernel upgrade T230785', diff saved to https://phabricator.wikimedia.org/P9083 and previous config saved to /var/cache/conftool/dbconfig/20190911-070635-marostegui.json
  • 07:00 hashar: Restarting Gerrit - T224448
  • 06:58 hashar: Restarting Gerrit
  • 06:45 marostegui: Drop unused database puppet on m1 - T231539
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9082 and previous config saved to /var/cache/conftool/dbconfig/20190911-061924-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9081 and previous config saved to /var/cache/conftool/dbconfig/20190911-061659-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2048, will be decommissioned T230106', diff saved to https://phabricator.wikimedia.org/P9080 and previous config saved to /var/cache/conftool/dbconfig/20190911-054855-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P9079 and previous config saved to /var/cache/conftool/dbconfig/20190911-054753-marostegui.json
  • 05:29 marostegui: Switchover s1 codfw master db2048 -> db2112 T230106
  • 03:31 eileen: civicrm revision changed from b343642c76 to 53aeba6318, config revision is 3e22a80bc8

2019-09-10

  • 20:46 ejegg: updated payments-wiki from 15baf7f58b to 5432f9c3a4
  • 20:24 XioNoX: add MSS clamp on install1002 - T2324563
  • 20:20 XioNoX: add MSS clamp on archiva1001 - T232456
  • 18:42 herron: rolling out "Aggregate IPsec Tunnel Status” icinga check, please disregard for the time being if it alerts
  • 18:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T229863 Remove EventBusRCFeedEngine eventServiceName (duration: 01m 05s)
  • 18:15 XioNoX: rollback test add static route on bast3002 to force advmss
  • 18:10 XioNoX: test add static route on bast3002 to force advmss
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/logging.php: T232042 Direct Parsoid/PHP rt-testing log events to a different target (duration: 01m 02s)
  • 17:56 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: T232122 Stop setting production value for eventlogging-service (duration: 01m 00s)
  • 17:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232122 Remove use of eventlogging-service (duration: 01m 03s)
  • 17:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-sync for safety after scap errored with a broken pipe (duration: 01m 03s)
  • 17:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write to static (JSON) as well as serialised cache for testwiki T223602 (duration: 01m 02s)
  • 17:29 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Be able to write to static (JSON) as well as serialised cache (duration: 01m 03s)
  • 16:35 elukey: reboot analytics-tool1001 via ganeti gnt - not reachable via ssh
  • 16:24 urandom: disabling reserved space on restbase-dev1005:/dev/mapper/restbase--dev1005--vg-srv -- T224554
  • 16:10 marostegui: Failover m1 from db1063 to db1135 - T231403
  • 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set items term store on write both for all of Wikidata" (duration: 01m 02s)
  • 15:58 thcipriani: restarting gerrit (again) https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&from=1568109359163&to=1568130959163&var-Application=&var-Window=30m due to T224448
  • 15:39 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.22
  • 15:37 marostegui: Start pre-switchover for m1 steps T231403
  • 15:35 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: Revert "Improve MultiHttpClient connection concurrency and reuse" - T232487 (duration: 00m 55s)
  • 15:33 reedy@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: T232487 (duration: 00m 55s)
  • 15:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 to 1.34.0-wmf.22 # T220747
  • 14:48 hashar@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:45 akosiaris: repool cp1075 ats-be, releases cert updated
  • 14:44 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 14:44 XioNoX: depool ulsfo for DC UPS power maintenance (see maint-announce)
  • 14:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:32 hashar@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747 (duration: 34m 03s)
  • 14:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:29 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:26 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 ottomata: increasing max_body_size to 10mb for all eventgate services - T232362
  • 14:14 akosiaris: depool cp1075 ats-be to test helmfile sync
  • 14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 13:58 hashar@deploy1001: Started scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747
  • 13:56 hashar: Applied security patches to 1.34.0-wmf.22 # T220747
  • 13:53 hashar: scap prep 1.34.0-wmf.22 # T220747
  • 13:34 elukey: reboot stat1005 to clear incosistent process state after tensorflow tests
  • 13:23 hashar: ./make-wmf-branch -n 1.34.0-wmf.22 -o master -c extensions/CharInsert # T220747
  • 13:12 thcipriani: restarting gerrit
  • 13:11 hashar: Gerrit experimenting difficulty due to ongoing wmf branch cut - T231872
  • 13:01 moritzm: copied prometheus-jmx-exporter to buster-wikimedia (from stretch-wikimedia, just a package with some jars)
  • 12:40 cmjohnson1: the new pdus are racked in b6
  • 12:14 cmjohnson1: removing power from ps1-b6 side B...mgmt should not be affected
  • 11:20 cmjohnson1: swapping the PDU in rack B6 eqiad T227541
  • 11:09 Urbanecm: EU SWAT done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)
  • 11:07 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,dc=eqiad,name=cp1075.eqiad.wmnet
  • 11:06 ema: cp1075: set weight in etcd back to 100
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6afe963: Set items term store on write both for all of Wikidata (T225055) (duration: 00m 55s)
  • 10:51 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:32 vgutierrez: repool cp5001 with ats-tls collecting memory usage details every hour - T232298
  • 09:56 elukey: restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck)
  • 09:50 moritzm: installing ghostscript security updates on jessie
  • 09:37 moritzm: added jbond as chanserv ops for #wikimedia-operations
  • 08:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:42 moritzm: reimaging mw2231 after hardware maintenance T231192
  • 07:21 moritzm: iron.wikimedia.org is no longer a bastion host
  • 06:57 moritzm: upgrading snapshot* to PHP 7.2.22 T230024
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1073 from config T231892 (duration: 00m 54s)
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1073 from config T231892 (duration: 00m 55s)
  • 05:35 marostegui: Stop MySQL on db2047 T231852
  • 05:35 marostegui: Remove db2047 from tendril and zarcillo - T231852
  • 05:33 urandom: decommissioning Cassandra, restbase-dev1005-b -- T224554
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1104 into API T230762', diff saved to https://phabricator.wikimedia.org/P9071 and previous config saved to /var/cache/conftool/dbconfig/20190910-051529-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 master and remove read-only from s8 T227062', diff saved to https://phabricator.wikimedia.org/P9070 and previous config saved to /var/cache/conftool/dbconfig/20190910-050213-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 as read-only for maintenance T230762', diff saved to https://phabricator.wikimedia.org/P9069 and previous config saved to /var/cache/conftool/dbconfig/20190910-050046-marostegui.json
  • 05:00 marostegui: Starting s8 failover from db1104 to db1109 - T227062
  • 04:46 vgutierrez: depool cp5001 for memory leak debugging on ATS - T232298
  • 04:23 marostegui: Start topology changes on s8, connect everything under db1109 - T230762
  • 04:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 and depool it from API T230762', diff saved to https://phabricator.wikimedia.org/P9068 and previous config saved to /var/cache/conftool/dbconfig/20190910-042243-marostegui.json
  • 04:18 marostegui: Start s8 (wikidata) pre switchover steps T230762
  • 00:59 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 00:59 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 00:57 Krinkle: krinkle@deploy1001: Deploy performance/navtiming f2a0863 - T226539
  • 00:41 urandom: decommissioning Cassandra, restbase-dev1005-a -- T224554

2019-09-09

  • 23:44 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/skins/MinervaNeue/: T232260 (duration: 00m 57s)
  • 22:28 ejegg: updated payments-wiki from 51d9ed79b6 to 15baf7f58b
  • 20:50 urandom: bootstrapping Cassandra, restbase-dev1004-b -- T224554
  • 19:48 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 05m 45s)
  • 19:42 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 19:41 mdholloway: mobileapps deployment failed repooling canary (scb2001); retrying
  • 19:40 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 02m 59s)
  • 19:37 XioNoX: fix eqsin CF tunnel missconfig
  • 19:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 17:56 andrewbogott: disabling puppet on labpuppetmaster1001 as part of T171188
  • 17:55 XioNoX: push cloudflare tunnel config to cr1-eqsin
  • 16:50 papaul: replacing Fan kit and power supplies on cr1-codfw
  • 14:22 urandom: bootstrapping Cassandra, restbase-dev1004-a -- T224554
  • 13:51 vgutierrez: upgrading ats to 8.0.5-1wm6 on cp5001 - T232298
  • 13:39 vgutierrez: uploaded trafficserver 8.0.5-1wm6 to apt.wikimedia.org (stretch) - T232298
  • 13:31 moritzm: installing facter update from buster 10.1 point release (T222356)
  • 13:15 moritzm: upgrading labweb/wikitech to PHP 7.2.22 T230024
  • 13:02 Urbanecm: Patch is deployed, deploy1001 should be clear
  • 13:01 moritzm: upgrading remaining mediawiki app servers (mw1266-mw1275) to PHP 7.2.22 T230024
  • 12:55 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/WikibaseMediaInfo/: ubn patch T231276 (duration: 00m 58s)
  • 12:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/Wikibase: ubn patch T231276 (duration: 01m 03s)
  • 12:48 moritzm: upgrading remaining job runners to PHP 7.2.22 T230024
  • 12:44 Urbanecm: EU SWAT wmf patch ongoing, testing with mwdebug1002
  • 12:41 ema: lvs1015 (primary): restart pybal to add service restbase-ssl T210411
  • 12:36 ema: lvs2003 (primary): restart pybal to add service restbase-ssl T210411
  • 12:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=eqiad
  • 12:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=codfw
  • 12:29 elukey: restart archiva again to debug download artifact issue
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase1022.eqiad.wmnet
  • 12:11 Urbanecm: Undeployed patch in wmf branch, will resolve soon
  • 12:01 moritzm: installing ldap-corp1001 T231015
  • 11:32 Urbanecm: Dry run for all wikis (T231137)
  • 11:26 moritzm: installing ldap-corp2001 T231015
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 10:22 effie: jiji@deploy1001:~$ scap sync-file wmf-config/CommonSettings.php "Push PHP7 traffic to 33.3% - T219150"
  • 09:48 moritzm: updated stretch netinst image to 9.11 T232308
  • 09:42 eileen: civicrm revision changed from d1d65f37ea to 516eeb54b5, config revision is 5a6a9c6c03
  • 09:40 moritzm: updated buster netinst image to 10.1 T232310
  • 09:28 ema: lvs1016, lvs2006 (secondaries): restart pybal to add service restbase-ssl T210411
  • 09:02 elukey: restart archiva on archiva1001 - stuck and not serving requests (no trace about why in the logs)
  • 08:55 eileen: civicrm revision is d1d65f37ea, config revision is 5a6a9c6c03
  • 08:38 vgutierrez: disabling systemd hardening for ats-tls on cp5001 - T232298
  • 07:33 moritzm: installing ghostscript security updates
  • 03:53 vgutierrez: reboot analytics-tool1001
  • 02:59 bd808: Testing twitter integration after software update for Stashbot. In theory messages up to 280 characters in length will now be passed through to the @wikimediatech Twitter feed without being truncated. This message should end with a unicorn face if that is correct. 🦄

2019-09-08

2019-09-06

  • 21:33 cdanis: cdanis@mw1317.eqiad.wmnet ~ 🕠🍺 sudo -i depool
  • 21:27 James_F: mw1317 seems corrupted (Fatal error: Class undefined: stdClass in /srv/mediawiki/php-1.34.0-wmf.21/includes/libs/rdbms/database/DatabaseMysqli.php); running scap pull
  • 18:01 godog: silence esams pages for 30m
  • 17:43 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux (duration: 02m 55s)
  • 17:40 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux
  • 17:39 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3 (duration: 00m 21s)
  • 17:38 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3
  • 17:26 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 2 (duration: 00m 37s)
  • 17:25 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T