You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech
Jump to navigation Jump to search

2019-11-16

  • 20:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:25 ariel@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:17 effie: restart rsyslog on mw2221
  • 09:43 elukey: systemctl restart hadoop-* on analytics1077 after oom killer

2019-11-15

  • 22:14 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:31 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:29 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 _joe_: disabling proxying to ws on phabricator1003
  • 20:04 XioNoX: push pfw policies to pfw3-eqiad - T238368
  • 20:02 XioNoX: push pfw policies to pfw3-codfw - T238368
  • 19:07 XioNoX: remove vlan 1 trunking between msw1-codfw and mr1-codfw, will cause a quick connectivity issue - T228112
  • 18:07 XioNoX: homer push on management switches
  • 17:30 mutante: phabricator - -started phd service
  • 17:11 XioNoX: homer push to management routers (https://gerrit.wikimedia.org/r/550576)
  • 16:43 hashar: Restored zuul-merger / CI for operations/puppet.git
  • 16:29 hashar: CI slowed down due to a huge spike of internal jobs. Being flushed as of now # T140297
  • 16:25 bblack: repool cp2001
  • 16:08 bblack: depool cp2001 for experiments
  • 16:02 moritzm: rebooting rpki1001 to rectify microcode loading
  • 16:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:51 ejegg: updated Fundraising CiviCRM from ae9b3819cd to c05c302e54
  • 15:36 ejegg: reduced batch size of CiviCRM contact deduplication jobs
  • 15:11 ema: pool cp3064 with ATS backend T227432
  • 15:07 ema: reboot cp3064 after reimage
  • 14:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:49 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:25 ema: depool cp3064 and reimage as text_ats T227432
  • 14:17 godog: SIGHUP prometheus@ops on prometheus1004
  • 14:13 bblack: lvs1013 - pybal restart for new config
  • 14:13 bblack: lvs2001 - pybal restart for new config
  • 14:13 bblack: lvs5001 - pybal restart for new config
  • 14:13 bblack: lvs4005 - pybal restart for new config
  • 14:12 bblack: lvs3005 - pybal restart for new config
  • 14:11 bblack: lvs5003 - pybal restart for new config
  • 14:11 bblack: lvs4007 - pybal restart for new config
  • 14:11 bblack: lvs3007 - pybal restart for new config
  • 14:10 bblack: lvs2004 - pybal restart for new config
  • 14:09 bblack: lvs1016 - pybal restart for new config
  • 13:28 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 03s)
  • 13:28 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 13:06 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure) (duration: 00m 04s)
  • 13:06 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (expecting failure)
  • 11:43 ariel@deploy1001: Finished deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts (duration: 00m 09s)
  • 11:43 ariel@deploy1001: Started deploy [dumps/dumps@61090ee]: configuration setting to produce empty abstracts
  • 11:27 moritzm: reboott ganeti4001-4003 to rectify microcode application
  • 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315 into vslow,dump after schema change', diff saved to https://phabricator.wikimedia.org/P9645 and previous config saved to /var/cache/conftool/dbconfig/20191115-112520-marostegui.json
  • 11:19 marostegui: Reboot dbproxy2002
  • 11:15 marostegui: Reboot dbproxy2004
  • 11:12 marostegui: Reboot dbproxy2001
  • 10:45 marostegui: Run maintain-views for s5 on labsdb1011 T233135
  • 10:38 moritzm: installing ghostscript security updates
  • 10:37 mobrovac: restbase - truncated parsoidphp data tables - T229015
  • 10:36 ema: pool cp3062 with ATS backend T227432
  • 10:24 godog: roll-restart logstash to apply configuration change
  • 10:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 ema: depool cp3062 and reimage as text_ats T227432
  • 09:47 vgutierrez: Use a synthetic warning for 1% of TLSv1/TLS1v.1 pageviews - T238038
  • 09:18 vgutierrez: Move cp1079 from nginx to ats-tls - T231627
  • 09:13 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 09:02 vgutierrez: Move cp1077 from nginx to ats-tls - T231627
  • 08:42 vgutierrez: Move cp2006 from nginx to ats-tls - T231627
  • 08:30 vgutierrez: Move cp2004 from nginx to ats-tls - T231627
  • 06:41 marostegui: Stop MySQL on db2065 to clone db2134 (this will trigger an haproxy irc alert) - T238183
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change and temporary pool db1082 into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9643 and previous config saved to /var/cache/conftool/dbconfig/20191115-060807-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 for compression', diff saved to https://phabricator.wikimedia.org/P9642 and previous config saved to /var/cache/conftool/dbconfig/20191115-060425-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 db1082 after schema changes', diff saved to https://phabricator.wikimedia.org/P9641 and previous config saved to /var/cache/conftool/dbconfig/20191115-060300-marostegui.json
  • 05:57 marostegui: Run maintain-views for s5 on labsdb1009, labsdb1010, labsdb1012 (pending labsdb1011 as it is still running the schema change) T233135
  • 05:07 vgutierrez: Move cp3064 from nginx to ats-tls - T231627
  • 04:38 volker-e@deploy1001: Finished deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:38 volker-e@deploy1001: Started deploy [design/style-guide@2ad7b1a]: Deploy design/style-guide:
  • 04:17 vgutierrez: Move cp3062 from nginx to ats-tls - T231627
  • 04:00 vgutierrez: Move cp3060 from nginx to ats-tls - T231627
  • 01:35 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/Handler/CompareHandler.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 53s)
  • 01:33 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/Rest/coreRoutes.json: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 52s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.35.0-wmf.5/includes/parser/Parser.php: deploying REST compare section feature because iOS team need it for a beta release due very soon (duration: 00m 54s)

2019-11-14

  • 23:03 mutante: restarting gerrit to ncrease defaultThreadPoolSize to 2
  • 22:29 eileen: civicrm revision changed from a3714003ff to ae9b3819cd, config revision is 6adc66a20b
  • 21:32 ssastry@deploy1001: Finished deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415 (duration: 08m 21s)
  • 21:24 ssastry@deploy1001: Started deploy [parsoid/deploy@150f9af]: Updating Parsoid to 74203415
  • 21:14 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:06 cdanis@cumin2001: dbctl commit (dc=all): 'remove now-defunct wikitech section T233236', diff saved to https://phabricator.wikimedia.org/P9639 and previous config saved to /var/cache/conftool/dbconfig/20191114-200649-cdanis.json
  • 20:04 gehel: reloading data on wdqs1004 from wdqs1007 to catch up on lag faster - T238229
  • 19:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:31 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:49 catrope@deploy1001: Synchronized wmf-config/: Use s10/s11 dblists for wikitechs (for real this time) (T233236) (duration: 00m 52s)
  • 18:37 catrope@deploy1001: Synchronized dblists/: Use s10/s11 dblists for wikitechs (T233236) (duration: 00m 51s)
  • 18:35 catrope@deploy1001: Synchronized dblists/: Add s10/s11 dblists for wikitechs (T233236) (duration: 00m 52s)
  • 18:34 mutante: scandium - restart php7.2-fpm
  • 18:31 mutante: phabricator (phab1003, prod server) - upgrade PHP version to 7.2.24 (T237239)
  • 18:17 cdanis@cumin2001: dbctl commit (dc=all): 'alias wikitech section to new s10 section T233236', diff saved to https://phabricator.wikimedia.org/P9638 and previous config saved to /var/cache/conftool/dbconfig/20191114-181732-cdanis.json
  • 17:46 robh: running dell epsa tool on cp3056 per T236497
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:22 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 17:22 ejegg: updated payments-wiki from bd907656fb to 30579d34d8
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 17:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 16:09 mutante: phab2001 - upgrading PHP version to 7.2.24 (T237239)
  • 16:06 mutante: scandium - upgrading PHP version to 7.2.24 (fyi, @subbu T228069) (T237239)
  • 16:04 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/Wikibase: Put a layer of APC cache on top of reading wb_terms in SqlEntityInfoBuilder (T231011 T229407 T236681), Try II (duration: 00m 56s)
  • 14:54 ema: pool cp3060 with ATS backend T227432
  • 14:53 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix bug when when looking up entity for an unknown ID (duration: 00m 53s)
  • 14:48 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group1 for T198312 (duration: 00m 53s)
  • 14:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: depool cp3060 and reimage as text_ats T227432
  • 13:37 ladsgroup@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 13:35 gehel: depool wdqs1004 to allow catching up on lag - T238229
  • 13:06 bblack: removing digicert-2019 files from cache nodes - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/550829/
  • 12:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation (duration: 14m 52s)
  • 12:09 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae]: Fix /metrics/mediarequests/top/ indentation
  • 11:58 mobrovac@deploy1001: Finished deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation (duration: 02m 50s)
  • 11:55 mobrovac@deploy1001: Started deploy [restbase/deploy@58cf5ae] (dev-cluster): Fix /metrics/mediarequests/top/ indentation
  • 11:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:48 vgutierrez: Rolling restart of ats-tls/ats-backend to upgrade to 8.0.5-1wm11 - T238307
  • 10:44 vgutierrez: uploaded trafficserver-8.0.5-1wm11 to apt.wikimedia.org (stretch) - T238307
  • 10:43 ema: pool cp3058 with ATS backend T227432
  • 10:25 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:20 godog: netbox1001 bandaid/symlink /srv/deployment/netbox/deploy/src/netbox/project-static to 'static'
  • 10:06 gehel: copying journal from wdqs1007 to wdqs1005 - T238232
  • 10:05 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 10:03 Urbanecm: Run deleteEqualMessages.php --delete for cswiki and viwiki
  • 09:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 gehel: depool wdqs (public) eqiad - high lag - T238229
  • 09:34 ema: depool cp3058 and reimage as text_ats T227432
  • 09:31 marostegui: Compare wikidatawiki.pagelinks between labsdb1011 and labsdb1010 - T233986
  • 09:25 moritzm: installing ghostscript updates on thumbor1001
  • 09:24 marostegui: Stop mysql on db2067 to clone db21133 - T238183
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Full weight to db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9635 and previous config saved to /var/cache/conftool/dbconfig/20191114-092006-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 marostegui: Compare wikidatawiki.pagelinks between db1124:3318 and labsdb1010 - T233986
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 marostegui: Remove ar_comment from triggers on db1124:3315 - T234704
  • 08:41 marostegui: Deploy schema change with replication on db1082, this will generate lag on s5 labs - T233135 T234066
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P9634 and previous config saved to /var/cache/conftool/dbconfig/20191114-084043-marostegui.json
  • 08:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P9633 and previous config saved to /var/cache/conftool/dbconfig/20191114-083729-marostegui.json
  • 08:03 eileen: process-control config revision is 6adc66a20b re-enable backfill
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool a non partitioned slave db1089 on special groups for s1 T223151', diff saved to https://phabricator.wikimedia.org/P9632 and previous config saved to /var/cache/conftool/dbconfig/20191114-080038-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 T235599', diff saved to https://phabricator.wikimedia.org/P9631 and previous config saved to /var/cache/conftool/dbconfig/20191114-075449-marostegui.json
  • 07:41 eileen: process-control config revision is b7c2cf7227 - disabled backfill again - some error?
  • 07:29 eileen: process-control config revision is 909108622d re-enable omnirecipient date repair job
  • 07:25 eileen: process-control config revision is d3ebeddcc1 (I renabled the old back fill job)
  • 07:12 moritzm: installing intel-microcode updates
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1067', diff saved to https://phabricator.wikimedia.org/P9630 and previous config saved to /var/cache/conftool/dbconfig/20191114-065309-marostegui.json
  • 06:16 marostegui: Stop replication on db1067
  • 06:01 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1083 to s1 master and remove read-only from s1 T234800', diff saved to https://phabricator.wikimedia.org/P9629 and previous config saved to /var/cache/conftool/dbconfig/20191114-060138-marostegui.json
  • 06:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T234800', diff saved to https://phabricator.wikimedia.org/P9628 and previous config saved to /var/cache/conftool/dbconfig/20191114-060026-marostegui.json
  • 06:00 marostegui: Starting s1 failover from db1067 to db1083 - T234800
  • 05:51 jynus: stopping db1114 replication
  • 05:34 marostegui: Compress db2089:3316 - T235599
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P9627 and previous config saved to /var/cache/conftool/dbconfig/20191114-052400-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P9626 and previous config saved to /var/cache/conftool/dbconfig/20191114-052303-marostegui.json
  • 05:13 marostegui: Move replicas from db1067 to db1083 T234800
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1083 with weight 0 T234800', diff saved to https://phabricator.wikimedia.org/P9625 and previous config saved to /var/cache/conftool/dbconfig/20191114-050940-marostegui.json
  • 05:08 vgutierrez: Repooling cp1077 - T238289
  • 05:07 marostegui: Start pre-failover steps T234800
  • 05:01 kart_: Updated cxserver to 2019-11-13-111130-production tag (T237379, T235748, T236906)
  • 04:56 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:51 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 04:49 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 03:49 vgutierrez: power cycling cp1077 - T238289
  • 03:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
  • 03:49 vgutierrez: depooling cp1077 - T238289
  • 00:41 ebernhardson: T237849 Start CirrusSearch forceSearchIndex.php commonswiki 2019-10-20T00:00:00 - 2019-11-14T01:00:00 pushing into jobqueue
  • 00:40 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 49s)
  • 00:39 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:39 crusnov@deploy1001: Finished deploy [netbox/deploy@56df4a5]: deploy netbox for script update (duration: 00m 44s)
  • 00:38 crusnov@deploy1001: Started deploy [netbox/deploy@56df4a5]: deploy netbox for script update
  • 00:36 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php: T237849: Restore CirrusSearchBuildDocumentParse hook (duration: 00m 54s)

2019-11-13

  • 23:00 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:58 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:25 catrope@deploy1001: Finished scap: For some reason that limited i18n sync didn't work, trying a full scap (duration: 18m 33s)
  • 22:07 catrope@deploy1001: Started scap: For some reason that limited i18n sync didn't work, trying a full scap
  • 22:04 catrope@deploy1001: scap sync-l10n completed (1.35.0-wmf.5) (duration: 02m 54s)
  • 22:00 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Update to master (b937dce) (duration: 00m 54s)
  • 20:17 XioNoX: delete unused asw2-esams:ae1
  • 19:37 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (again) (duration: 00m 52s)
  • 18:49 Jeff_Green: authdns-update to remove host alnilam
  • 17:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update WD item blacklist (duration: 00m 53s)
  • 16:41 gehel: depool wdqs1005 - T238232
  • 16:36 gehel: restart blazegraph on wdqs1005
  • 16:21 ema: pool cp3054 with ATS backend T227432
  • 16:21 gehel: draining elastic1017-1031 to prepare for decommission - T230746
  • 16:02 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:00 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P9621 and previous config saved to /var/cache/conftool/dbconfig/20191113-155134-marostegui.json
  • 15:39 moritzm: powercycle cloudbackup2002
  • 15:35 ema: depool cp3054 and reimage as text_ats T227432
  • 15:32 moritzm: rebooting cloudbackup2002
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:29 jynus: shutdown db2072 T237905
  • 15:29 gehel: configuration of new elasticsearch servers completed, all working and pooled - T230746
  • 14:55 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9620 and previous config saved to /var/cache/conftool/dbconfig/20191113-145541-jynus.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P9619 and previous config saved to /var/cache/conftool/dbconfig/20191113-134938-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9618 and previous config saved to /var/cache/conftool/dbconfig/20191113-134625-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089 after upgrade', diff saved to https://phabricator.wikimedia.org/P9617 and previous config saved to /var/cache/conftool/dbconfig/20191113-133410-marostegui.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for upgrade', diff saved to https://phabricator.wikimedia.org/P9616 and previous config saved to /var/cache/conftool/dbconfig/20191113-132216-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P9615 and previous config saved to /var/cache/conftool/dbconfig/20191113-131530-marostegui.json
  • 11:56 effie: Upgrade to php 7.2.24-1 mediawiki eqiad hosts and restart php-fpm - T237239
  • 11:55 ema: cp-ats: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:46 moritzm: rebooting cloudcontrol2001-dev for microcode debugging
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:38 moritzm: rebooting labtestpuppetmaster2001 for microcode debugging
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:27 ema: cp-ats-ulsfo: rolling trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:27 moritzm: rebooting cloudcontrol2003-dev for some microcode debugging
  • 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:24 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9614 and previous config saved to /var/cache/conftool/dbconfig/20191113-110802-marostegui.json
  • 11:05 Urbanecm: EU SWAT done
  • 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ffwiki* (T238191)
  • 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 0a90ef9: Update localized logos for the Fula Wikipedia (T238191) (duration: 00m 54s)
  • 10:53 vgutierrez: Testing ats-tls-restart on cp5007 - T237425
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9613 and previous config saved to /var/cache/conftool/dbconfig/20191113-104326-marostegui.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9612 and previous config saved to /var/cache/conftool/dbconfig/20191113-103225-marostegui.json
  • 10:27 gehel: start configuration of new elasticsearch servers - T230746
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9610 and previous config saved to /var/cache/conftool/dbconfig/20191113-102054-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9609 and previous config saved to /var/cache/conftool/dbconfig/20191113-101127-marostegui.json
  • 09:51 jynus: upgraded wmf-mariadb101-client on cumin hosts
  • 09:50 mobrovac@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:43 mobrovac@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 09:41 mobrovac@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
  • 09:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374 (duration: 11m 19s)
  • 09:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8]: Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki - T229015 T238117 T238116 T237374
  • 09:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki (duration: 02m 35s)
  • 09:06 mobrovac@deploy1001: Started deploy [restbase/deploy@1f2c7d8] (dev-cluster): Start storing Parsoid/PHP results; add gcrwiki, shywiktionary, szywiki
  • 08:25 marostegui: Stop MySQL on db2062 to copy its data to db2132 T238183
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:09 marostegui: Fix replication on labsdb1010 - T233986
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P9607 and previous config saved to /var/cache/conftool/dbconfig/20191113-070339-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 for compression', diff saved to https://phabricator.wikimedia.org/P9606 and previous config saved to /var/cache/conftool/dbconfig/20191113-070055-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3317 after compression', diff saved to https://phabricator.wikimedia.org/P9605 and previous config saved to /var/cache/conftool/dbconfig/20191113-065952-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P9604 and previous config saved to /var/cache/conftool/dbconfig/20191113-065823-marostegui.json
  • 06:25 volker-e@deploy1001: Finished deploy [design/style-guide@edce4cc]: Deploy design/style-guide: (duration: 00m 08s)
  • 06:25 volker-e@deploy1001: Started deploy [design/style-guide@edce4cc]: Deploy design/style-guide:
  • 01:35 eileen: civicrm revision changed from 3c15db25bb to a3714003ff, config revision is d678dbcaa5

2019-11-12

  • 23:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fix: Do not return after inserting a single suggestion (duration: 00m 52s)
  • 23:51 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/src/mediawiki.interface.helpers.styles.less: Remove extraneous semicolons (T233649), part 2 (duration: 00m 52s)
  • 23:49 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes/changes/ChangesList.php: Remove extraneous semicolons (T233649), part 1 (duration: 00m 53s)
  • 23:49 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:45 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:22 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:20 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:37 bblack: repool cp1076 (experiments concluded)
  • 22:35 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: enabling REST API (duration: 00m 52s)
  • 22:34 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: enabling REST API (duration: 00m 52s)
  • 22:32 eileen: civicrm revision changed from bfa53ee611 to 3c15db25bb, config revision is d678dbcaa5
  • 21:54 bblack: depooling cp1076 for some local experimentation
  • 20:18 herron: reprepro copy buster-wikimedia stretch-wikimedia prometheus-elasticsearch-exporter
  • 20:11 otto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:11 otto@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:46 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P7007 --new-data-type external-id (T234221)
  • 19:45 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P4839 --new-data-type external-id (T234221)
  • 19:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Sync a previously undeployed change to InitialiseSettings-labs.php that someone forgot to deploy (as a no-op) in production (duration: 00m 52s)
  • 19:41 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set MCR migration stage to NEW on group0 for T198312 (duration: 00m 52s)
  • 19:19 arlolra: Updated Parsoid to 6a0a708 (T215000, T235295, T235656, T235217, T235295, T236846, T237556, T235231)
  • 19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708 (duration: 10m 09s)
  • 18:58 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Final fixes and tweaks for testing (duration: 00m 53s)
  • 18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f516018]: Updating Parsoid to 6a0a708
  • 18:39 ejegg: re-enabled Omnimail and contact de-duplication jobs
  • 18:20 Urbanecm: Morning SWAT done
  • 18:18 Urbanecm: Deploy security patch for T237887
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 130ef87: Add right "abusefilter-log-private" to usergroup "rollbacker" at ptwiki (T237830) (duration: 00m 53s)
  • 18:08 XioNoX: push pfw change to add recdns anycast IP
  • 17:33 XioNoX: update fasw-c-eqiad to match current standard (ntp/users/rootpw/lldp)
  • 17:22 XioNoX: update fasw-c-codfw to match current standard (ntp/users/rootpw/lldp)
  • 17:03 ema: pool cp3052 with ATS backend T238085
  • 17:03 ema: pool cp3052 with ATS backend T227432
  • 16:53 bblack: cpNNNN (all cache nodes) - cumin manual removal of globalsign-2018 remnants (key, cert, ocsp config, ocsp output)
  • 16:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 XioNoX: setup bgp session from cr2-codfw to multihop RIS collector - T106056
  • 16:21 XioNoX: reboot scs-c1-eqiad.mgmt.eqiad.wmnet - T238036
  • 16:09 ema: depool cp3052 and observe performance impact T238085 before reimaging as text_ats T227432
  • 15:49 marostegui: Deploy schema change on db1102:3315 T233135 T234066
  • 15:45 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Fixes and tweaks for initial rollout (duration: 00m 53s)
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for a schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9600 and previous config saved to /var/cache/conftool/dbconfig/20191112-154127-marostegui.json
  • 15:24 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=schema
  • 14:46 bblack: cpNNNN (all caches): remove stale outputs from transient ocsp failures ( /var/cache/ocsp/update-ocsp-*.tmp )
  • 14:41 ema: cp4022: trafficserver (8.0.5-1wm10) and fifo-log-demux (0.6) upgrade and restart
  • 14:38 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4021.ulsfo.wmnet,service=nginx
  • 14:35 ema: cp4021: ats-tls-restart to see if https://gerrit.wikimedia.org/r/550475 fixed the script
  • 14:16 Jeff_Green: authdns-update to deploy fundraising-read.wmnet service cname adjustment
  • 14:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set all of wikidata for write both for term store" (duration: 00m 52s)
  • 12:57 godog: refresh kibana field list
  • 12:46 gehel: repool wdqs1004
  • 12:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size 100 (T237984)
  • 12:19 onimisionipe: restarting blazegraph on wdqs1005
  • 12:11 effie: Reimage mwdebug1002 - T214734
  • 11:47 Amir1: EU SWAT is done
  • 11:47 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase term store error reduction, Do not catch DBError in ReplicaMasterAwareRecordIdsAcquirer. (T236466) (duration: 00m 56s)
  • 11:44 effie: Upgrade wtp* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 11:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata for write both for term store (T225055) (duration: 00m 52s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SECURITY: Dont allow Wikimedia sysops to see who had 2FA disabled (duration: 00m 53s)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9599 and previous config saved to /var/cache/conftool/dbconfig/20191112-104400-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9598 and previous config saved to /var/cache/conftool/dbconfig/20191112-103641-marostegui.json
  • 10:35 onimisionipe: resetting cronfile on wdqs hosts
  • 10:33 marostegui: Drop labtestwiki database from m5 master db1133 - T236010
  • 10:30 marostegui: Deploy schema change on dbstore1003:3315
  • 10:07 ema: repool cp3065, nothing interesting in kern.log and SEL T238032
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9596 and previous config saved to /var/cache/conftool/dbconfig/20191112-095221-marostegui.json
  • 09:42 marostegui: Remove privileges for labtestwiki on m5 - T236010
  • 09:27 gehel: restarting blazegraph on wdqs1004
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083', diff saved to https://phabricator.wikimedia.org/P9595 and previous config saved to /var/cache/conftool/dbconfig/20191112-091706-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for mariadb upgrade to 10.1.39 - T234800', diff saved to https://phabricator.wikimedia.org/P9594 and previous config saved to /var/cache/conftool/dbconfig/20191112-091158-marostegui.json
  • 09:11 marostegui: Upgrade mariadb to 10.1.39 on db1083 (candidate master for s1)
  • 08:56 moritzm: restarting archiva to pick up Java security updates
  • 08:44 volker-e@deploy1001: Finished deploy [design/style-guide@3de6820]: Deploy design/style-guide: (duration: 00m 06s)
  • 08:44 volker-e@deploy1001: Started deploy [design/style-guide@3de6820]: Deploy design/style-guide:
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1083', diff saved to https://phabricator.wikimedia.org/P9593 and previous config saved to /var/cache/conftool/dbconfig/20191112-083720-marostegui.json
  • 08:37 gehel: depool wdqs1004 to investigate update lag
  • 08:35 moritzm: installing poppler security updates
  • 08:24 volker-e@deploy1001: Finished deploy [design/style-guide@b926b95]: Deploy design/style-guide: (duration: 00m 07s)
  • 08:24 volker-e@deploy1001: Started deploy [design/style-guide@b926b95]: Deploy design/style-guide:
  • 08:15 moritzm: installing curl security updates
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9592 and previous config saved to /var/cache/conftool/dbconfig/20191112-081322-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic to db1083', diff saved to https://phabricator.wikimedia.org/P9591 and previous config saved to /var/cache/conftool/dbconfig/20191112-074006-marostegui.json
  • 07:36 elukey: remove /etc/logrotate.d/wdqs_autodeployment_log from wdqs1009 (not in puppet anymore and causing cronspam)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1083 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P9590 and previous config saved to /var/cache/conftool/dbconfig/20191112-072823-marostegui.json
  • 07:10 marostegui: Upgrade kernel on db1083 (s1 candidate master)
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 for kernel upgrade - T234800', diff saved to https://phabricator.wikimedia.org/P9589 and previous config saved to /var/cache/conftool/dbconfig/20191112-070436-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:44 marostegui: Change triggers on s5 db2094 - T234704
  • 06:40 marostegui: Deploy schema change on s5 codfw with replication, this will generate lag on s5 codfw T233135 T234066
  • 06:21 marostegui: Compress db2087:3316, db2087:3317 T235599
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for compression - T235599', diff saved to https://phabricator.wikimedia.org/P9588 and previous config saved to /var/cache/conftool/dbconfig/20191112-061959-marostegui.json
  • 03:41 vgutierrez: restart wdqs-blazegraph on wdqs1004

2019-11-11

  • 22:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 22:49 ema: power-cycle cp3065, currently down
  • 19:36 XioNoX: disable ALGs on mr1-esams
  • 18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 00m 57s)
  • 18:19 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 18:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT (duration: 15m 14s)
  • 18:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@222b1c2]: New WDQS build - 0.3.6-SNAPSHOT
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 17:44 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:41 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:44 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 15:42 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 14:26 ema: pool cp3050 with ATS backend T227432
  • 13:50 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:48 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:25 ema: depool cp3050 and reimage as text_ats T227432
  • 12:59 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 12:46 effie: Upgrade to 7.2.24-1 mwdebug[2001-2002].codfw.wmnet,mwmaint2001.codfw.wmnet,deploy2001.codfw.wmnet - T237239
  • 12:31 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010 (duration: 00m 28s)
  • 12:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Deploy updates on wdqs1010
  • 12:28 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T237239
  • 12:21 effie: Upgrade mw2* to 7.2.24-1 with elegance and restart php-fpm - T231881
  • 11:55 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:52 hoo: Updated the Wikidata property suggester with data from the 2019-11-04 JSON dump and applied the T132839 workarounds
  • 10:48 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:47 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 10:32 vgutierrez: restarting ats-tls on cp1088
  • 10:21 jynus: upgrade mariadb on db2102
  • 10:16 ema: repool cp4027 after successful X-Wikimedia-Debug testing P9585 T237687
  • 10:12 jynus: manually run full backup of labtestpuppetmaster2001 T235819
  • 09:41 ema: test x-wikimedia-debug-routing.lua on cp4027 (depooled) T237687
  • 09:09 volker-e@deploy1001: Finished deploy [design/style-guide@0ea65f2]: Deploy design/style-guide: (duration: 00m 07s)
  • 09:09 volker-e@deploy1001: Started deploy [design/style-guide@0ea65f2]: Deploy design/style-guide:
  • 08:28 marostegui: Stop MySQL on db2048 before decommissioning - T237913
  • 08:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2048 from config T237913 (duration: 00m 51s)
  • 08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2048 from config T237913 (duration: 00m 54s)
  • 08:21 marostegui: Remove db2048 from tendril and zarcillo T237913
  • 06:56 elukey: delete /etc/logrotate.d/wdqs-reload-categories from wdqs* as attempt to reduce cronspam
  • 06:44 marostegui: Delete globalblocks table from napwikisource T230055
  • 05:27 vgutierrez: Switch from nginx to ats-tls on cp3058 - T231627

2019-11-09

  • 20:25 reedy@deploy1001: Synchronized langlist-labs: T237823 (duration: 00m 54s)
  • 02:39 volker-e@deploy1001: Finished deploy [design/style-guide@d2bfc09]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:39 volker-e@deploy1001: Started deploy [design/style-guide@d2bfc09]: Deploy design/style-guide:
  • 01:07 volker-e@deploy1001: Finished deploy [design/style-guide@ef82b69]: Deploy design/style-guide: (duration: 00m 07s)
  • 01:07 volker-e@deploy1001: Started deploy [design/style-guide@ef82b69]: Deploy design/style-guide:
  • 01:06 volker-e@deploy1001: Finished deploy [design/style-guide@97fb3ee]: Deploy design/style-guide: (duration: 00m 09s)
  • 01:06 volker-e@deploy1001: Started deploy [design/style-guide@97fb3ee]: Deploy design/style-guide:

2019-11-08

  • 20:26 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation request jobs by 5 mins for testing (duration: 00m 52s)
  • 16:54 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 jeh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 jeh@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "MachineVision: Enable testers-only mode on testcommonswiki for debugging" (duration: 00m 54s)
  • 15:57 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118, db1106 at 100%', diff saved to https://phabricator.wikimedia.org/P9582 and previous config saved to /var/cache/conftool/dbconfig/20191108-155700-jynus.json
  • 15:37 herron: beginning rolling service restarts on logstash hosts for java security updates
  • 15:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enable testers-only mode on testcommonswiki for debugging (duration: 00m 52s)
  • 14:56 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:55 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9581 and previous config saved to /var/cache/conftool/dbconfig/20191108-145028-jynus.json
  • 14:42 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: stop and upgrade percona-server on test host db1114
  • 13:27 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:12 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9580 and previous config saved to /var/cache/conftool/dbconfig/20191108-131257-jynus.json
  • 13:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee2027c: Change the language of Votewiki back to English (en) (T230614) (duration: 00m 54s)
  • 12:34 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:14 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 10%', diff saved to https://phabricator.wikimedia.org/P9578 and previous config saved to /var/cache/conftool/dbconfig/20191108-121444-jynus.json
  • 12:02 jynus: update and restart db1118
  • 12:01 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1118 fully', diff saved to https://phabricator.wikimedia.org/P9577 and previous config saved to /var/cache/conftool/dbconfig/20191108-120138-jynus.json
  • 11:55 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9576 and previous config saved to /var/cache/conftool/dbconfig/20191108-115553-jynus.json
  • 11:27 jynus@cumin1001: dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9575 and previous config saved to /var/cache/conftool/dbconfig/20191108-112733-jynus.json
  • 11:25 jynus@cumin1001: dbctl commit (dc=all): 'repool db2130', diff saved to https://phabricator.wikimedia.org/P9574 and previous config saved to /var/cache/conftool/dbconfig/20191108-112503-jynus.json
  • 11:12 jynus: update and restart db2130
  • 11:11 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2116, depool db2130', diff saved to https://phabricator.wikimedia.org/P9573 and previous config saved to /var/cache/conftool/dbconfig/20191108-111125-jynus.json
  • 10:58 Amir1: running rebuildItemTerms on 8028 items (T234329)
  • 10:51 jynus: update and restart db2116
  • 10:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2103, depool db2116', diff saved to https://phabricator.wikimedia.org/P9572 and previous config saved to /var/cache/conftool/dbconfig/20191108-105013-jynus.json
  • 10:38 jynus: update and restart db2103
  • 10:34 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephmon[1-3] T228102
  • 10:33 jeh: enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephosd[1-3] T224188
  • 10:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2092, depool db2103', diff saved to https://phabricator.wikimedia.org/P9571 and previous config saved to /var/cache/conftool/dbconfig/20191108-103218-jynus.json
  • 10:19 jynus: update and restart db2092
  • 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2071, depool db2092', diff saved to https://phabricator.wikimedia.org/P9570 and previous config saved to /var/cache/conftool/dbconfig/20191108-101759-jynus.json
  • 10:09 elukey: restart jvm-based hadoop daemons on an-master100[1,2] to pick up the new openjdk version
  • 10:06 jynus: update and restart db2071
  • 10:03 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P9569 and previous config saved to /var/cache/conftool/dbconfig/20191108-100310-jynus.json
  • 10:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2072', diff saved to https://phabricator.wikimedia.org/P9568 and previous config saved to /var/cache/conftool/dbconfig/20191108-100128-jynus.json
  • 09:50 moritzm: uploaded openjdk 8u232-b09-1~deb10u1 to component/jdk8 for apt.wikimedia.org/buster-wikimedia
  • 09:41 jynus: update and restart db2072
  • 09:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9567 and previous config saved to /var/cache/conftool/dbconfig/20191108-094100-jynus.json
  • 09:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9566 and previous config saved to /var/cache/conftool/dbconfig/20191108-093958-jynus.json
  • 09:35 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 09:29 jynus: update and restart db2094
  • 09:27 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9565 and previous config saved to /var/cache/conftool/dbconfig/20191108-092735-jynus.json
  • 09:10 jynus: update and restart db1106
  • 09:08 moritzm: installing Java security updates on kafka-jumbo
  • 09:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 fully', diff saved to https://phabricator.wikimedia.org/P9564 and previous config saved to /var/cache/conftool/dbconfig/20191108-090746-jynus.json
  • 09:05 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 09:04 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9563 and previous config saved to /var/cache/conftool/dbconfig/20191108-090451-jynus.json
  • 09:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9562 and previous config saved to /var/cache/conftool/dbconfig/20191108-090012-jynus.json
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:52 jynus: stop and upgrade db1124 (may create temporary lag on wikireplicas)
  • 08:31 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:23 elukey: restart kafka on kafka-jumbo1001 to test the new openjdk
  • 08:07 moritzm: installing fribidi security updates on Buster
  • 03:03 vgutierrez: Switch from nginx to ats-tls on cp3054 - T231627
  • 02:42 vgutierrez: Switch from nginx to ats-tls on cp3052 - T231627
  • 01:23 reedy@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GlobalBlocking/: Prevent some extra db queries (duration: 00m 53s)
  • 01:14 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Use internationalized semicolon separators (T233649) (duration: 00m 53s)
  • 01:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic (duration: 03m 04s)
  • 01:06 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic
  • 00:44 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logging.js: Fix homepage instrumentation (T237600) (duration: 00m 52s)
  • 00:40 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/includes: Sync DiffEngine changes that were needed to unbreak CI (duration: 00m 55s)
  • 00:34 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/resources/: Semicolon should appear after log entries (T237500) (duration: 00m 53s)
  • 00:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix remote API configs for GrowthExperiments (duration: 00m 51s)
  • 00:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable suggested edits as hidden preference on arwiki, cswiki, kowiki, viwiki (T236968) (duration: 00m 53s)

2019-11-07

  • 23:49 foks: removing one file for legal compliance
  • 23:47 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert phatalaty again (duration: 03m 04s)
  • 23:44 shdubsh: start elasticsearch on logstash1008
  • 23:44 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert phatalaty again
  • 23:41 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: one more time (duration: 03m 00s)
  • 23:38 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: one more time
  • 23:31 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout (duration: 03m 02s)
  • 23:28 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: trying again with a longer scap timeout
  • 23:23 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: revert to previous phatality plugin version (duration: 02m 55s)
  • 23:20 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: revert to previous phatality plugin version
  • 23:09 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 00m 06s)
  • 23:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 23:04 twentyafterfour@deploy1001: Finished deploy [releng/phatality@11d4ad8]: (no justification provided) (duration: 06m 48s)
  • 23:00 XenoRyet: updated payments-wiki from aac3d93f70 to bd907656fb
  • 22:57 twentyafterfour@deploy1001: Started deploy [releng/phatality@11d4ad8]: (no justification provided)
  • 22:53 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes (duration: 00m 05s)
  • 22:53 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update responsive Illustrations styles changes
  • 22:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Remove annotation job delay (duration: 00m 53s)
  • 22:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions (duration: 00m 06s)
  • 22:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Update to latest master with components overview additions
  • 21:54 andrewbogott: rebuilding labtestpuppetmaster2001 w/Stretch
  • 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
  • 21:28 mutante: boron apt-get clean (saved 9G on /) (T237649)
  • 20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.5 refs T233853
  • 20:24 catrope@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ArticleTarget.js: Fix error handling (duration: 01m 00s)
  • 20:21 herron: performing rolling reboots of kafka-main hosts for security updates
  • 20:17 onimisionipe: cluster restart for cloudelastic to pick JVM upgrade
  • 20:08 eileen: civicrm revision changed from f1ce5c86f7 to bfa53ee611, config revision is 72d2692743
  • 19:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enqueue annotation job on upload complete (duration: 05m 19s)
  • 18:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Disable retrying annotation requests (duration: 05m 17s)
  • 18:25 ebernhardson: restart mjolnir-kafka-bulk-daemon and mjolnir-kafka-msearch-daemon across `cirrus` dsh group
  • 18:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration (duration: 05m 49s)
  • 18:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@afd41d7]: bulk_daemon: Adjust glent configuration
  • 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:38 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 17:30 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 17:25 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Drop currently unsupported external dependencies (T227349) (duration: 05m 19s)
  • 17:10 XioNoX: Homer push - forwarding-options - to all cr
  • 17:09 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:08 XioNoX: add sampling stanza (disabled) to cr2-esams
  • 17:00 mutante: wtp2020 - 2 hours downtime - shut down (T205712) - go ahead @papaul
  • 17:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 16:58 mutante: wtp2020 - depooled for T205712
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp2020.codfw.wmnet
  • 16:42 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: some alphasorted config (duration: 01m 00s)
  • 16:34 XioNoX: Homer push on cr2-knams: Sampling (disabled), enhanced-hash-key, ospf interfaces re-ordering (noop), policy-statement BGP_from_LVS (unused), lo0 term allow_vmhost
  • 16:32 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 100%', diff saved to https://phabricator.wikimedia.org/P9553 and previous config saved to /var/cache/conftool/dbconfig/20191107-163235-jynus.json
  • 16:20 XioNoX: add BGP sessions to AS64050 in eqiad
  • 16:15 XioNoX: add BGP sessions to AS57695 in esams and eqiad
  • 16:12 XioNoX: clear v4 BGP sessions to AS7713 in eqsin (hit max prefix limit)
  • 16:02 mutante: mw2225 restart cron (T236799)
  • 15:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta logging (duration: 01m 00s)
  • 15:41 XioNoX: remove BGP to AS3491 on eqiad (left the IX)
  • 15:40 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:53 jbond42: rebuilding compiler1001
  • 13:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 50%', diff saved to https://phabricator.wikimedia.org/P9551 and previous config saved to /var/cache/conftool/dbconfig/20191107-135018-jynus.json
  • 12:47 Urbanecm: EU SWAT done
  • 12:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 8e71601: a36ed85: GrowthExperiments: Configure testwiki for suggested edits testing + follow up patch (T237634) (duration: 00m 59s)
  • 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 19034af: GrowthExperiments: Configure intro links for suggested edits (T235723) (duration: 01m 00s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2be3f86: [cirrus] remove cross_cluster_single_shard_search quirk (duration: 01m 02s)
  • 12:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5253dec: Give commonswiki filemovers `suppressredirect` rights (T236348) (duration: 01m 03s)
  • 11:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 fully (duration: 01m 01s)
  • 11:54 jbond42: update puppet_version used by CI 545289
  • 11:50 jbond42: rebuilding compiler1002
  • 11:36 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1080 at 10%', diff saved to https://phabricator.wikimedia.org/P9550 and previous config saved to /var/cache/conftool/dbconfig/20191107-113611-jynus.json
  • 11:16 jynus: stop and upgrade db1080
  • 10:58 moritzm: installing Java security updates on kafka-main/logstash
  • 10:50 moritzm: installing Java security updates on wdqs/maps
  • 10:46 jynus@cumin1001: dbctl commit (dc=all): 'Fully depool db1080', diff saved to https://phabricator.wikimedia.org/P9549 and previous config saved to /var/cache/conftool/dbconfig/20191107-104618-jynus.json
  • 10:28 moritzm: upgrading mw1277-1279 servers to PHP 7.2.24 T237239
  • 10:27 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1080 weight', diff saved to https://phabricator.wikimedia.org/P9548 and previous config saved to /var/cache/conftool/dbconfig/20191107-102747-jynus.json
  • 09:41 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1016 with low weight (duration: 01m 02s)
  • 09:30 jynus: stop and upgrade es1016
  • 09:18 moritzm: installing Java security updates on aqs/druid/Hadoop
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1016 (duration: 01m 04s)
  • 09:03 jynus: stop and upgrade es2012, es2014
  • 08:48 jynus: stop and upgrade es2011
  • 08:30 jynus: upgrade and restart db2093
  • 00:21 XioNoX: enable interface damping on primary eqsin-codfw link - T236878
  • 00:09 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/549227 (duration: 01m 00s)
  • 00:00 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 04m 29s)

2019-11-06

  • 23:56 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 23:55 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads (duration: 14m 56s)
  • 23:40 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@d2ad2da]: bulk_daemon: support ltr model uploads
  • 22:36 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on commonswiki (T227349)
  • 22:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 22:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on commonswiki (T227349) (duration: 01m 00s)
  • 22:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation jobs on commonswiki only (duration: 01m 01s)
  • 22:17 mdholloway: created MachineVision extension tables on commonswiki
  • 22:13 XioNoX: push standard forwarding-options to cr3/4-ulsfo
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 22:04 mholloway-shell@deploy1001: Synchronized private/PrivateSettings.php: Configure Google Cloud Vision API credentials (2/2) (T236426) (duration: 00m 59s)
  • 22:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1247.eqiad.wmnet
  • 22:03 mholloway-shell@deploy1001: Synchronized private/GoogleCloudVision.php: Configure Google Cloud Vision API credentials (1/2) (T236426) (duration: 00m 59s)
  • 21:57 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/MachineVision: Allow specifying API credentials as an associative array (T236426) (duration: 01m 01s)
  • 21:53 thcipriani: checkout /srv/mediawiki-staging/php-1.35.0-wmf.5/maintenance/Maintenance.php looks like a local change for debugging left behind
  • 21:47 arlolra: Updated Parsoid to 1d283ed (T237104, T227209, T236865)
  • 21:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed (duration: 10m 22s)
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@7e86f83]: Updating Parsoid to 1d283ed
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1247.eqiad.wmnet
  • 21:14 XioNoX: push standard forwarding-options to cr3-esams
  • 21:12 milimetric@deploy1001: Finished deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns (duration: 10m 52s)
  • 21:01 milimetric@deploy1001: Started deploy [analytics/refinery@dc85f9d]: Hdfs Cleaner and TLS columns
  • 20:36 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.5/extensions/OpenStackManager/: sync openstackmanager to deploy https://gerrit.wikimedia.org/r/#/q/I5b08f0069941052acdd9f05a62aac5b2cf9ecdd5 (duration: 01m 00s)
  • 20:34 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.5 refs T233853 (duration: 01m 00s)
  • 20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.5 refs T233853
  • 19:05 mutante: mw1225 - re-enabling puppet (no reason given, nothing in SAL or Phab but disabled)
  • 18:43 mutante: LDAP - add dwisehaupt to wmf group (T235676)
  • 18:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix typo (T222117) (duration: 01m 00s)
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Instrument logging to ClosedWikiProvider (T222117) (duration: 01m 01s)
  • 17:22 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1126 weight, too much backlog', diff saved to https://phabricator.wikimedia.org/P9542 and previous config saved to /var/cache/conftool/dbconfig/20191106-172235-jynus.json
  • 17:21 ejegg: turned off donation queue consumer for financial_trxn record fix
  • 17:17 ejegg: updated Fundraising CiviCRM from 1c3be265ae to f1ce5c86f7
  • 17:15 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 fully (duration: 00m 59s)
  • 17:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WebAuthn extension if wmgUseWebAuthn is set (false in all of production) T227242 (duration: 01m 00s)
  • 17:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseWebAuthn false in all of production T227242 (duration: 01m 01s)
  • 17:08 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 fully', diff saved to https://phabricator.wikimedia.org/P9541 and previous config saved to /var/cache/conftool/dbconfig/20191106-170852-jynus.json
  • 16:11 mdholloway: MachineVision: Imported Freebase to Wikidata ID mappings on testcommonswiki (T227349)
  • 15:58 mdholloway: created MachineVision tables on testcommonswiki (T227349)
  • 15:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure MachineVision and enable on testcommonswiki (T227349) (duration: 01m 00s)
  • 15:47 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: MachineVision: Use an HTTP proxy in production (T236843) (duration: 01m 01s)
  • 15:42 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Do not restrict to testing users on Beta (duration: 01m 00s)
  • 15:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Fix Beta config with updated service name (duration: 01m 02s)
  • 14:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1019 with low weight (duration: 00m 59s)
  • 14:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable streaks and revert counts (T234955, T234956) (duration: 01m 00s)
  • 14:27 jynus: upgrade and restart es1019
  • 14:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 01m 00s)
  • 14:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 50%', diff saved to https://phabricator.wikimedia.org/P9539 and previous config saved to /var/cache/conftool/dbconfig/20191106-140702-jynus.json
  • 12:38 Urbanecm: EU SWAT done
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 2/2) (duration: 01m 00s)
  • 12:36 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a239b14: Allow certain users to create account at closed wikis (T222117; 1/2) (duration: 00m 59s)
  • 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3e9ede0: Add 104 (Cookbook) to $wgContentNamespaces for bnwikibooks (T236840) (duration: 01m 00s)
  • 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5875c45: [cirrus] Disable instant indexing on wikidata (duration: 01m 15s)
  • 11:57 jynus: upgrade and restart db2048
  • 11:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1074 at 10%', diff saved to https://phabricator.wikimedia.org/P9537 and previous config saved to /var/cache/conftool/dbconfig/20191106-113510-jynus.json
  • 11:14 jynus: stopping db1074 for maintenance (will create temporary s2 lag on wikireplicas)
  • 11:06 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P9536 and previous config saved to /var/cache/conftool/dbconfig/20191106-110603-jynus.json
  • 09:46 moritzm: upgrading mw1262-mw1265,mw1276 servers to PHP 7.2.24 T237239
  • 09:33 jynus: stop and upgrade labsdb1011 T236015
  • 09:25 jynus: depooling labsdb1011 for wikireplica service T236015
  • 09:10 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 08:58 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 08:51 jynus: upgrading wmf-mariadb101-client on cumin hosts
  • 08:51 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.24 T237239
  • 08:33 jynus: upgrading db2102 mariadb (test-s1)
  • 07:48 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: T233213 (duration: 11m 38s)
  • 07:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: T233213
  • 02:59 vgutierrez: Switch from nginx to ats-tls on cp5012 - T231627
  • 00:07 mdholloway: created table wikimedia_editor_tasks_edit_streak on x1/wikishared (T234956)

2019-11-05

  • 23:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.5 refs T233853
  • 23:25 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.5 refs T233853 (duration: 24m 13s)
  • 23:01 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:51 twentyafterfour@deploy1001: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2905573311"/* "/srv/mediawiki-staging/php-1.35.0-wmf.5/cache/l10n"' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:50 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:39 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2076118383" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 26s)
  • 22:38 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:17 twentyafterfour: scap failed with error: A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. refs T233853
  • 22:09 twentyafterfour@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_840646293" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 04m 54s)
  • 22:04 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.5 refs T233853
  • 22:03 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr2-esams:lo0.0
  • 21:58 XioNoX: remove 127.0.0.1/32 and ::1/128 from cr3-esams:lo0.0
  • 20:45 mutante: shutting down cobalt (formerly gerrit server)
  • 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:33 XioNoX: push fw policies to pfw3-eqiad - T236201
  • 20:23 XioNoX: push fw policies to pfw3-codfw - T236201
  • 20:17 joal@deploy1001: Finished deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch (duration: 08m 21s)
  • 20:09 joal@deploy1001: Started deploy [analytics/refinery@ea631bd]: Analytics deploy for spark upgrade - forgotten patch
  • 20:08 joal@deploy1001: Finished deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade (duration: 08m 49s)
  • 20:00 joal@deploy1001: Started deploy [analytics/refinery@8013a86]: Analytics deploy for spark upgrade
  • 18:40 twentyafterfour: MediaWiki train: start branching wmf/1.35.0-wmf.5
  • 18:30 XioNoX: fix typo on cr1-eqsin:lo0.0 v6 IP
  • 18:27 ejegg: updated payments-wiki from 0de9d96208 to aac3d93f70
  • 17:21 jynus: restarting etherpad
  • 16:56 arturo: deleted stretch-wikimedia/thirdparty/kubeadm-k8s and created buster-wikimedia/thirdparty/kubeadm-k8s
  • 16:24 papaul: Replacing disk on db2120
  • 15:37 jynus: deploying schema change on x1 T234955
  • 15:20 ema: cp4027: upgrade trafficserver to 8.0.5-1wm10
  • 14:37 jynus: reducing consistency temporarilly on db1114 so it can catch up replication
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:58 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 ema: pool cp5012 with ATS backend T227432
  • 10:45 vgutierrez: restarting atsmtail@backend on cp5006
  • 09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 ema: wb2-phab stopped saying things a while ago. Restarted
  • 09:18 jynus: restart dbprov100[12] T236924
  • 09:11 jynus: restart dbprov2001 T236924
  • 08:12 vgutierrez: uploaded fifo-log-demux 0.6 to apt.wikimedia.org (stretch)
  • 08:02 jynus: redact mnwwiki on db1124 and db2094 T235743
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp5011 - T231627
  • 04:13 vgutierrez: Switch from nginx to ats-tls on cp5010 - T231627
  • 03:51 vgutierrez: pooling cp3057 - T237348
  • 03:46 mutante: wdqs1004 restarting wdqs-blazegraph
  • 03:01 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3057.esams.wmnet
  • 02:59 vgutierrez: depool cp3057 - T237348
  • 00:15 mutante: gerrit - restarting service to re-enable jgit gc (T217497)
  • 00:13 mutante: gerrit2001 - restart gerrit (replica)

2019-11-04

  • 23:18 milimetric@deploy1001: Finished deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs (duration: 07m 20s)
  • 23:11 milimetric@deploy1001: Started deploy [analytics/refinery@99f1535]: Fix for geoeditors jobs
  • 23:05 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:03 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:08 bd808: The Wikimedia SAL Twitter feed is now @wikimedia_sal (https://twitter.com/wikimedia_sal) T237322
  • 20:51 bd808: Testing twitter feed following account confirmation
  • 19:23 Urbanecm: Morning SWAT done
  • 19:17 mutante: cobalt - stopping services, removing apache2
  • 19:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6a4b966: Add throttle rule for bard college editathon (T236955) (duration: 00m 54s)
  • 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9204768: Enable DNS blacklist for es.wikinews (T237151) (duration: 00m 53s)
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0fc3909: Allow FlaggedRevs autoreview permission to be assigned globally (duration: 00m 54s)
  • 18:30 andrew@deploy1001: Finished deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode (duration: 03m 27s)
  • 18:26 andrew@deploy1001: Started deploy [horizon/deploy@1ac26da]: add new user-selected puppet edit mode
  • 18:24 ppchelko@deploy1001: Finished deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902 (duration: 14m 30s)
  • 18:17 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts (duration: 12m 07s)
  • 18:09 ppchelko@deploy1001: Started deploy [restbase/deploy@20c710d]: Bump Parsoid-PHP mirroring to 100% T235902
  • 18:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2cb2dde]: Event logging via Event Gate and Absolute classpath for munge and runUpdate scripts
  • 17:41 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Update for YAML-reading (offline) (duration: 00m 52s)
  • 17:39 jforrester@deploy1001: Synchronized wmf-config/config/: Sync out YAML config files (duration: 00m 56s)
  • 15:43 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable revert counts on beta (T234955) (duration: 00m 53s)
  • 15:36 jynus: running failing check_private_data report on labsdb1009 T235743
  • 15:33 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 00m 59s)
  • 15:01 joal@deploy1001: Started restart [analytics/aqs/deploy@59a97fa]: (no justification provided)
  • 14:36 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:36 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 ema: upload trafficserver 8.0.5-1wm10 to stretch-wikimedia
  • 13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:47 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:38 elukey: update bacula terms on analytics-in{4,6} filters on cr{1,2}-eqiad - T237016
  • 13:28 jbond42: update production puppetmasters to use new puppetdb servers
  • 13:20 Amir1: Creating Mon Wikipedia is done T235739
  • 13:19 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 13:16 ladsgroup@deploy1001: Synchronized langlist: T235739 (duration: 00m 52s)
  • 13:15 ladsgroup@deploy1001: Synchronized static/images/project-logos/: T235739 (duration: 00m 53s)
  • 13:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T235739 (duration: 00m 53s)
  • 13:13 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T235739 (duration: 00m 52s)
  • 13:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: T235739
  • 13:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 53s)
  • 13:06 ema: depool cp5012 and reimage as text_ats T227432
  • 12:21 Urbanecm: EU SWAT done
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 2/2) (duration: 00m 52s)
  • 12:12 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki* (T236905)
  • 12:11 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 7c1c64c: Add localized Minerva wordmark for Sindhi Wikipedia (T200870; 1/2) (duration: 00m 53s)
  • 12:08 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: a6d64b1: Update logo for zh-classical Wikipedia (T236905) (duration: 00m 53s)
  • 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c92a13c: Enable partial blocks on kowiki (T236752) (duration: 00m 54s)
  • 12:00 moritzm: upgrading mw1261 to PHP 7.2.24 (T237239)
  • 11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 11:08 moritzm: uploaded PHP 7.2.24 to apt.wikimedia.org stretch-wikimedia/component/php72 (T237239)
  • 04:53 vgutierrez: Switch from nginx to ats-tls on cp5009 - T231627
  • 04:39 vgutierrez: Switch from nginx to ats-tls on cp5008 - T231627

2019-11-03

  • 03:54 andrew@deploy1001: Finished deploy [horizon/deploy@0c024d4]: one more prefix fix (duration: 03m 35s)
  • 03:50 andrew@deploy1001: Started deploy [horizon/deploy@0c024d4]: one more prefix fix
  • 03:10 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try) (duration: 00m 25s)
  • 03:10 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try)
  • 03:09 andrew@deploy1001: Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (duration: 06m 01s)
  • 03:03 andrew@deploy1001: Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation

2019-11-02

  • 00:58 mutante: gerrit-replica - created missing /var/lib/gerrit2/review_site/tmp and restarted service - service back up on buster (T176774)
  • 00:34 mutante: gerrit-replica - fixing permissions of files in /srv/gerrit and restarting
  • 00:27 mutante: gerrit2001 - copy mysql-connector-java.jar into /usr/share/java/ and link it into /var/lib/gerrit2/review_site/lib (T176774)
  • 00:05 mutante: rsyncing gerrit plugin dir from gerrit1001 to gerrit2001 (T176774)

2019-11-01

  • 23:45 mutante: rsyncing gerrit git data from gerrit1001 to gerrit2001 (using --delete too!) T176774
  • 22:00 mutante: gerrit - repo sync between gerrit and gerrit-replica in progress .. if you can't clone from replica you can use main gerrit and replica will come back
  • 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: T237126 Fixing DOM in upload interface of UploadWizard (duration: 00m 56s)
  • 21:06 mutante: scp /usr/share/java/mysql-connector-java.jar from gerrit1001 to gerrit2001 (T176774)
  • 20:46 cdanis: add to bot_blocked_nets the IPs of several EC2 instances sending expensive requests to ORES T237134
  • 19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:37 mutante: gerrit2001 - reinstalling with buster
  • 19:03 volker-e@deploy1001: Finished deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration (duration: 00m 11s)
  • 19:03 volker-e@deploy1001: Started deploy [design/style-guide@4abbc70]: Add wikimedia deployment (scap) configuration
  • 16:39 XioNoX: push Add BGP_from_LVS policy and term vmhost to loopback4 filter to CRs
  • 16:37 ema: pool cp5011 with ATS backend T227432
  • 16:16 XioNoX: asw2-a-eqiad# run request system license add terminal
  • 15:39 moritzm: installing libonig security updates
  • 15:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 moritzm: installing libpcap security updates
  • 15:11 moritzm: installing python-ecdsa security updates
  • 14:34 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:34 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 ema: depool cp5011 and reimage as text_ats T227432
  • 14:02 moritzm: rebooting kafka-main1004 for microcode tests
  • 14:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:56 moritzm: upgrading mwdebug2002 to PHP 7.2.24 for some smoke tests with the new build
  • 12:18 ema: pool cp5010 with ATS backend T227432
  • 11:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:21 ema: depool cp5010 and reimage as text_ats T227432
  • 11:08 effie: enable puppet mediawiki and prometheus servers
  • 10:54 effie: remove prometheus-hhvm-exporter package from mw* servers - T229792
  • 10:37 moritzm: installing clamav security updates on mendelevium
  • 10:33 effie: Disable puppet on mediawiki and prometheus servers to remove hhvm exporters - T229792
  • 09:28 moritzm: installing file security updates on jessie
  • 09:21 effie: depool mw1317
  • 09:19 moritzm: installing golang-1.11 security updates
  • 08:57 moritzm: installing ruby-loofah security updates
  • 08:17 moritzm: installing libarchive security updates
  • 01:58 volker-e@deploy1001: Finished deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements (duration: 00m 05s)
  • 01:58 volker-e@deploy1001: Started deploy [design/style-guide@4d8d085]: deploying design/style-guide with mobile layout improvements
  • 01:21 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/resources/src/mediawiki.widgets/mw.widgets.UsersMultiselectWidget.js: T236460 mw.widgets.UsersMultiselectWidget: Fix property name (duration: 00m 54s)

2019-10-31

  • 23:33 Urbanecm: Evening SWAT done
  • 23:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice/extension.json: SWAT: dcd3ec3: Fix error in CentralNoticeImpression schema (T236627) (duration: 00m 51s)
  • 23:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/VisualEditor/: SWAT: 3686b82: Revert "Parse relative hrefs on image nodes like on regular links" (T237040) (duration: 00m 53s)
  • 23:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 02bf4b8: Re-enable mobile editor A/B testing (T236337) (duration: 00m 52s)
  • 23:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/bawiki* (T237035)
  • 23:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 54ee973: Change bawiki logo to an anniversary one (T237035) (duration: 00m 53s)
  • 23:04 eileen: civicrm revision changed from d2045c6b98 to 1183915bde, config revision is 1a709a61aa
  • 23:00 mutante: replacing deployment keys for apache2secmod ; re-arming keyholder on deployment server
  • 22:51 XioNoX: Homer push to cr1/2-eqiad
  • 22:17 XioNoX: Homer push to cr1/2-codfw
  • 22:14 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 00m 06s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:12 mutante: vega sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:12 twentyafterfour@deploy1001: deploy aborted: testing deploy_design (duration: 05m 07s)
  • 22:12 mutante: bromine sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \;
  • 22:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 22:05 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 01m 30s)
  • 22:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: testing deploy_design
  • 21:59 mutante: deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677)
  • 21:49 mutante: deploy1001 keyholder restart, keyholder arm ...
  • 21:46 mutante: deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677)
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 (duration: 13m 44s)
  • 21:25 robh: setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga
  • 21:18 ppchelko@deploy1001: Started deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902
  • 20:35 XioNoX: Homer push to all cr2-eqdfw - new NTP servers, remove border-in4 term unused-ips, add (unused) BGP_Wikimedia_pops, re-order ospf interfaces
  • 20:27 shdubsh: restarting logstash on logstash1008 to test level->severity filter selector
  • 20:12 XioNoX: Homer push to all msw* - new NTP servers - T237011
  • 20:07 XioNoX: Homer push to all asw* - new NTP servers - T237011
  • 19:49 XioNoX: Homer push to eqsin
  • 19:49 mutante: rsyncing home dirs from previous gerrit server cobalt to gerrit1001
  • 19:36 fdans@deploy1001: Finished deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt (duration: 06m 53s)
  • 19:31 XioNoX: Homer push to ulsfo
  • 19:29 fdans@deploy1001: Started deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt
  • 19:08 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.4
  • 18:22 Urbanecm: Morning SWAT done
  • 18:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/CentralNotice: SWAT: 3e5b33f: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 00m 55s)
  • 18:20 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/CentralNotice: SWAT: 963e963: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 01m 01s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe08fbb: Undeploy reader surveys in English, Polish, and Russian (T232525) (duration: 01m 02s)
  • 18:01 fdans@deploy1001: Finished deploy [analytics/refinery@8ca04df]: deploying refinery (duration: 01m 09s)
  • 18:00 fdans@deploy1001: Started deploy [analytics/refinery@8ca04df]: deploying refinery
  • 16:23 bd808: Our @wikimediatech Twitter account is soft blocked pending phone number verification. bd808 trying to figure out a good way to do that verification for a bot account.
  • 16:14 jynus: restart dbprov2002 after upgrade T236924
  • 16:09 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 100%', diff saved to https://phabricator.wikimedia.org/P9513 and previous config saved to /var/cache/conftool/dbconfig/20191031-160925-jynus.json
  • 15:28 jgleeson: Updated paymentswiki from e28bc54e85 to 0de9d96208
  • 14:56 Urbanecm: Password reset for SUL user `Darth AK`
  • 14:50 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119 at 10%', diff saved to https://phabricator.wikimedia.org/P9512 and previous config saved to /var/cache/conftool/dbconfig/20191031-145010-jynus.json
  • 14:28 jynus: reloading ferm on db1119
  • 14:24 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P9511 and previous config saved to /var/cache/conftool/dbconfig/20191031-142455-jynus.json
  • 13:40 effie: upload xdebug 2.7.0-1+wmf2 to component/php72 - T234418
  • 13:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s)
  • 13:16 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json
  • 11:48 jynus: setting pc1008 as a replica of active pc1010
  • 11:43 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s)
  • 11:37 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json
  • 11:24 Urbanecm: EU SWAT done
  • 11:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/ProofreadPage/: SWAT: e0d5ce9: Add page navigation tabs in correct order skin-side and remove js requirement for Vector tab icons (T231250); ed17da2: Makes sure that Vector default background does not override the navigation arrows (T236969) (duration: 01m 02s)
  • 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 547086|Enable ContentTranslation out of Beta in Albanian WP (T236064) (duration: 01m 02s)
  • 11:03 ema: cp5008: restart ats-be to clear "backend process restarted" alert
  • 11:00 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:59 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:54 godog: bounce logstash on logstash2004
  • 10:39 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:38 ema: pool cp5009 with ATS backend T227432
  • 10:37 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:35 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:30 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:29 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:19 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:13 godog: bounce logstash on logstash2004
  • 10:07 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 10:05 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:43 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 godog: temporarily stop logstash on logstash2005 to test performance with two ingesters only - T215904
  • 09:23 godog: temporarily stop logstash on logstash2006 to test performance with two ingesters only - T215904
  • 09:10 ema: depool cp5009 and reimage as text_ats T227432
  • 08:25 ariel@deploy1001: Finished deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation (duration: 00m 03s)
  • 08:25 ariel@deploy1001: Started deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation
  • 06:37 elukey: upgrade cergen to 0.2.5 on puppetmaster1001
  • 03:44 vgutierrez: switch from nginx to ats-tls on cp4032 - T231627
  • 03:09 vgutierrez: switch from nginx to ats-tls on cp4031 - T231627
  • 02:51 vgutierrez: switch from nginx to ats-tls on cp4030 - T231627
  • 01:41 eileen: civicrm revision changed from 0547c84f73 to d2045c6b98, config revision is 1a709a61aa (looks like patch was still hung in gerrit last time)
  • 01:34 eileen: civicrm revision is 0547c84f73, config revision is 1a709a61aa - that should stop those failmails
  • 00:40 jforrester@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/WikiLove/resources/ext.wikiLove.icon.vector.css: T236958 Fix Vector icon after upstream change (duration: 01m 02s)
  • 00:38 eileen: civicrm revision changed from a55c2d2787 to 0547c84f73, config revision is 1a709a61aa

2019-10-30

  • 23:21 ejegg: updated fundraising python tools from ffc7bf764b to a93eec292d
  • 23:08 XioNoX: power cycle cr3-esams re1 - T236598
  • 22:29 mutante: scandium - live hack /srv/mediawiki/wmf-config/InitialiseSettings.php - set wmgMemoryLimit to 850 (*1024 *1024), restart php7.2-fpm (T236833)
  • 22:22 andrew@deploy1001: Finished deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode (duration: 03m 15s)
  • 22:19 andrew@deploy1001: Started deploy [horizon/deploy@2d551d8]: Rolling out a currently-turned-off puppet edit mode
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837 (duration: 13m 54s)
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@fa934c8]: Bump parsoid mirroring to 25% and fix 412: T235902, T236837
  • 21:31 ppchelko@deploy1001: Finished deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838 (duration: 14m 04s)
  • 21:17 ppchelko@deploy1001: Started deploy [restbase/deploy@88cf547]: Parsoid mirroring followups: T236837, T236838
  • 20:47 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:47 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:46 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 20:46 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:42 arlolra: Updated Parsoid to 5ac1623 (T235656, T233818, T234549, T227209, T236112)
  • 20:29 otto@deploy1001: Synchronized wmf-config/LabsServices.php: Syncing LabsServices.php change for beta eventgate instance replacement (duration: 01m 01s)
  • 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623 (duration: 09m 10s)
  • 20:25 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 18s)
  • 20:24 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@a69ec92]: Updating Parsoid to 5ac1623
  • 20:17 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: WikimediaEditorTasks: Enable edit streaks on beta (duration: 01m 03s)
  • 20:11 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 03s)
  • 20:11 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:10 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 51s)
  • 20:09 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:07 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 07s)
  • 20:07 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:06 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 23s)
  • 20:06 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 05s)
  • 20:03 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 19:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 19:06 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.4 (duration: 01m 00s)
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.4
  • 19:05 mutante: moscovium - stop and remove rsync server, purge rsync package T180641
  • 18:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T222851 Migrate to Kask for Echo seen-time storage (duration: 01m 01s)
  • 17:43 elukey: upload cergen 0.2.5-1+deb10u1 to buster-wikimedia component/cergen
  • 17:41 elukey: run reprepro clearvanished on install1002 to clean leftovers of buster-wikimedia|thirdparty/elastic7
  • 17:37 twentyafterfour@deploy1001: Finished deploy [design/style-guide@4d8d085]: (no justification provided) (duration: 00m 04s)
  • 17:37 twentyafterfour@deploy1001: Started deploy [design/style-guide@4d8d085]: (no justification provided)
  • 17:29 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Revert 16:05 UTC T236928 (duration: 01m 05s)
  • 17:26 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Revert 16:02 UTC T236928 (duration: 01m 04s)
  • 16:59 jynus: killed rebuildItemTerms on mwmaint1002
  • 16:05 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T234948) (duration: 01m 04s)
  • 16:02 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Stop locking and use DISTINCT when finding used terms to delete (T236466) (duration: 01m 05s)
  • 15:48 godog: roll restart logstash after https://gerrit.wikimedia.org/r/c/operations/puppet/+/544217
  • 15:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 06s)
  • 15:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.4/extensions/Wikibase: Wikibase deadlock reduction, Shorten out when there is nothing to clean up (T236466) (duration: 01m 05s)
  • 15:36 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 15:29 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 15:23 gehel: shutting down elastic1039 to be ready for disk swap - T236601
  • 15:10 effie: enable-puppet in mw* hosts
  • 15:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 14:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T210174 Load Wikisource extension when wmgUseWikisource is true (duration: 01m 01s)
  • 14:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T236502 Define wmgUseWikisource as default-false (duration: 01m 22s)
  • 14:40 ema: pool cp5008 with ATS backend T227432
  • 14:32 effie: disable puppet on all mw* hosts
  • 14:20 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 14:19 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 14:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 13:39 andrew@deploy1001: Finished deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver (duration: 03m 38s)
  • 13:36 andrew@deploy1001: Started deploy [horizon/deploy@53028ab]: Rolling out improvments to the puppet git archiver
  • 12:59 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=cp5008.eqsin.wmnet
  • 12:58 moritzm: rolling restart of slapd to pick up LDAP schema change
  • 12:57 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet
  • 12:50 arturo: updating package versions in install1002 for thirdparty/kubeadm-k8s stretch-wikimedia (T236824)
  • 12:23 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:22 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 moritzm: temporarily disabling puppet on LDAP servers for a schema change
  • 11:42 ema: depool cp5008 and reimage as text_ats T227432
  • 11:37 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 11:31 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase rate limits for newbie non-ip users on Commons (duration: 01m 01s)
  • 11:13 Urbanecm: EU SWAT done
  • 11:12 Urbanecm: Synchronized wmf-config/InitialiseSettings.php: SWAT: 61cb77c: Re-apply: MCR: Set testwiki to use the new MCR-only schema (T198558) (duration: 00m 59s)
  • 10:07 jynus: restarting bacula-dir, bacula-sd on backup1001 T236406
  • 09:46 vgutierrez: Switch from nginx to ats-tls on cp4029 - T231627
  • 09:34 vgutierrez: Switch from nginx to ats-tls on cp4028 - T231627
  • 09:25 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 08:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 08:25 moritzm: installing php7.0 security updates
  • 07:58 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 07:57 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 05:58 vgutierrez: Rolling restart of ats-tls to get rid of leaked sockets and benefit from the lower inactivity timeout - T236458
  • 04:24 vgutierrez: restarting ats-tls on cp4027 with half open disabled - T236458
  • 03:09 vgutierrez: Rolling restart of prometheus-exporter-trafficserver-tls - T236458
  • 02:40 vgutierrez: restarting ats-tls on cp3050 with half open disabled - T236458
  • 00:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php

2019-10-29

  • 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 23:09 mutante: ganeti1003 - gnt-instance remove ununpentium.wikimedia.org (T236748)
  • 23:05 Urbanecm: Evening SWAT done
  • 23:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/atjwiki* (T236777)
  • 23:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: f7b9972: Revert "Milestone lobo for atjwiki" (T236777) (duration: 01m 01s)
  • 22:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:17 mutante: ununpentium - shutdown Ganeti VM - running decom script, schedule icinga downtime (T236748)
  • 22:14 mutante: rsynced data dump and config from ununpentium to moscovium in /srv/ before shutting down the old server (T180641)
  • 20:43 papaul: rebooting cp3056 for HW check
  • 20:19 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw complete (T235654)
  • 19:42 andrew@deploy1001: Finished deploy [horizon/deploy@dbe892e]: (no justification provided) (duration: 03m 59s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@dbe892e]: (no justification provided)
  • 19:32 jynus: restarting bacula-fd on install1002 T236406
  • 19:31 andrew@deploy1001: Finished deploy [horizon/deploy@bab5d37]: (no justification provided) (duration: 01m 35s)
  • 19:30 andrew@deploy1001: Started deploy [horizon/deploy@bab5d37]: (no justification provided)
  • 19:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.4
  • 19:14 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache (duration: 21m 11s)
  • 18:54 jynus@cumin1001: dbctl commit (dc=all): 'Revert state to before overload+maintenance', diff saved to https://phabricator.wikimedia.org/P9501 and previous config saved to /var/cache/conftool/dbconfig/20191029-185438-jynus.json
  • 18:53 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.4 and rebuild l10n cache
  • 18:53 Trey314159: reindexing Slovak wikis on elastic@eqiad and elastic@codfw (T235654)
  • 18:50 brennen@deploy1001: Pruned MediaWiki: 1.35.0-wmf.1 (duration: 08m 09s)
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902 (duration: 14m 13s)
  • 18:07 ppchelko@deploy1001: Started deploy [restbase/deploy@cf80130]: Mirror 10% of /page/html/ traffic to Parsoid/PHP T235902
  • 17:42 brennen: cutting branch for 1.35.0-wmf.4
  • 17:38 mutante: phab1001 - upgrading php7.3 packages
  • 17:34 mutante: phab2001 - upgrading PHP packages
  • 17:06 jynus@cumin1001: dbctl commit (dc=all): 'repool db1099 both instances fully to increase redundancy', diff saved to https://phabricator.wikimedia.org/P9499 and previous config saved to /var/cache/conftool/dbconfig/20191029-170648-jynus.json
  • 16:56 jynus@cumin1001: dbctl commit (dc=all): 'depool fully db1105:3311, stability/lag issues', diff saved to https://phabricator.wikimedia.org/P9498 and previous config saved to /var/cache/conftool/dbconfig/20191029-165633-jynus.json
  • 16:52 ssastry@deploy1001: Finished deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d (duration: 09m 35s)
  • 16:46 jynus@cumin1001: dbctl commit (dc=all): 'pool db1106 into s1 rcs', diff saved to https://phabricator.wikimedia.org/P9497 and previous config saved to /var/cache/conftool/dbconfig/20191029-164640-jynus.json
  • 16:43 ssastry@deploy1001: Started deploy [parsoid/deploy@aa59ce3]: Update parsoid to 089bf28d
  • 16:39 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 16:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 16:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 16:28 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 06m 11s)
  • 16:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 16:22 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 16:20 mutante: reloading nginx on wtp*
  • 15:57 bstorm_: restarted ferm on labstore1006 -- it failed an external DNS lookup due to brief issues apparently on the other end
  • 15:25 vgutierrez: restarting ats-tls on cp5007 with a default inactivity timeout of 5 minutes and half open disabled - T236458
  • 15:04 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 15:01 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 14:58 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 14:45 robh: setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538
  • 14:32 Krinkle: krinkle@webperf1001.eqiad Restart navtiming, coal and statsv services
  • 14:29 elukey: upgrade python-kafka on webperf[12]001 - T234808
  • 14:27 Krinkle: krinkle@webperf2001 Restart navtiming, coal and statsv services
  • 12:32 hashar: Restarting Zuul / Jenkins
  • 12:31 hashar: Stopping Zuul / Jenkins for upgrade
  • 12:29 akosiaris: delete all production00 volumes on backup1001
  • 11:48 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 11:37 Urbanecm: EU SWAT done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: faeb8f1: Allow AbuseFilter to issue blocks on es.wikinews (T236730) (duration: 00m 53s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fc9920e: Rename Author talk namespace at thwikisource (T236640) (duration: 00m 56s)
  • 11:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 11:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 10:51 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:51 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:46 jakob@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:39 jakob@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 10:33 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 10:29 moritzm: installing php5 security updates
  • 10:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 10:21 jynus: running import on m1-master, m1 replicas will lag for a whileT236406
  • 10:20 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 XioNoX: disable cr3-esams:et-1/0/0 (flapping)
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 gehel: plugin upgrade on relforge - T236123
  • 09:27 godog: reimage elastic 7 hw with Buster
  • 09:27 vgutierrez: restart ats-tls on cp5007 disabling TCP SO_LINGER - T236458
  • 08:43 jynus: shutting down db1099 T227538
  • 08:35 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1099', diff saved to https://phabricator.wikimedia.org/P9492 and previous config saved to /var/cache/conftool/dbconfig/20191029-083547-jynus.json
  • 08:15 XioNoX: push term allow_vmhost ro cr3-esams loopback4 filter - T236598
  • 08:06 vgutierrez: restarting ats-tls on cp5007 with TCP FASTOPEN disabled - T236458
  • 07:40 moritzm: installing php7.3 security updates
  • 07:06 elukey: roll restart java daemons on analytics1042, druid1003 and aqs1004 to pick up new openjdk upgrades
  • 07:01 _joe_: restart memcached on mc1024-1036, 1 hour apart, via cumin (T235188)
  • 06:26 _joe_: restart memcached on mc1023 T23518
  • 03:35 vgutierrez: restarting varnish-frontend on cp5008

2019-10-28

  • 23:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy Echo kask migration to officewiki for testing, part 3 (T222851) (duration: 00m 52s)
  • 23:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy Echo kask migration to officewiki for testing, part 2 (T222851) (duration: 00m 52s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/ProductionServices.php: Deploy Echo kask migration to officewiki for testing, part 1 (T222851) (duration: 00m 54s)
  • 23:18 mutante: re-enabling puppet on moscovium (RT)
  • 22:02 ejegg: re-enabled basic fundraising jobs (Queue consumers, audit processors, TY mailer)
  • 20:56 cdanis: restart memcached on mc1022 T235188
  • 20:37 Jeff_Green: authdns update to switch fundraising db service hostname
  • 20:19 ejegg: disabled all fundraising scheduled jobs
  • 19:50 rlazarus: restarted memcached on mc1021 (T235188)
  • 19:41 ssastry@deploy1001: Finished deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d (duration: 02m 42s)
  • 19:38 ssastry@deploy1001: Started deploy [parsoid/deploy@d932d6a]: Update parsoid to 089bf28d
  • 18:53 moritzm: updating PHP on people1001
  • 18:52 Urbanecm: Morning SWAT done
  • 18:42 urbanecm@deploy1001: Synchronized wmf-config/logging.php: SWAT: 1a09e2a: Direct Parsoid/PHP logs to a parsoid-php log "type" (T235899) (duration: 00m 52s)
  • 18:41 rlazarus: restarted memcached on mc1020 T235188
  • 18:32 mutante: moscovium - rename all files in /etc/request-tracker4/RT_SiteConfig.d to have a .pm extension - this fixed RT - login works again - puppet patch coming up (T180641)
  • 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 30111f3: Enable mapframe at kawiki (T229726) (duration: 00m 53s)
  • 18:28 mutante: moscovium - deleting /etc/request-tracker4/RT_SiteConfig.d/ 50-debconf.pm and 51-dbconfig-common.pm which duplicate the same files without .pm extension with wrong values, probably due to some package change (T180641)
  • 18:27 jgleeson: updated paymentswiki from 7bb9f5257e to e28bc54e85
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: c48271d: Revert "Config changes for Echo kask migration" (T222851) (duration: 00m 53s)
  • 18:24 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditor.php: SWAT: b19ad5f: Revert "Revert "ApiVisualEditor: Return etag with content for preloaded content""; 4f3b724: ApiVisualEditor: Fix preload handling further (T233320) (duration: 00m 53s)
  • 18:15 Urbanecm: Run mwscript namespaceDupes.php --wiki=thwikisource --fix (T236640)
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ea927dd: Rename author NS at thwikisource (T236640) (duration: 00m 53s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: ddaa534: Config changes for Echo kask migration (T222851) (duration: 00m 55s)
  • 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:12 bblack: mr1-eqiad: fix bast3004 access for eqiad mgmt network - T236686
  • 17:11 _joe_: starting rolling restart of memcached servers in eqiad, beginning with mc1019 T235188
  • 17:11 bblack: mr1-codfw: fix bast3004 access for codfw mgmt network - T236686
  • 17:10 bblack: mr1-ulsfo: fix bast3004 access for ulsfo mgmt network - T236686
  • 16:57 bblack: mr1-eqsin: fix bast3004 access for eqsin mgmt network - T236686
  • 16:56 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:55 bblack: mr1-esams: fix bast3004 access for esams mgmt network - T236686
  • 16:36 jbond42: restart puppetdb on pupetdb1001 to remove queue
  • 13:50 ema: pool cp5007 with ATS backend T227432
  • 13:30 godog: roll restart logstash in codfw/eqiad to apply new config
  • 13:23 effie: enable puppet on mw1*, depool and repool to reload apache - T229792
  • 13:13 effie: enable puppet on mw[1261-1265].eqiad.wmnet (mw canaries), depool and repool to reload apache - T229792
  • 13:07 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:07 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 effie: enable puppet on mw2* servers, depool and repool to reload apache - T229792
  • 13:01 jynus: stop db1114 for testing
  • 12:30 ema: depool cp5007 and reimage as text_ats T227432
  • 12:22 effie: depool mw2150
  • 11:56 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001 (duration: 00m 05s)
  • 11:56 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: testing deployment of phabricator to phab1001
  • 11:34 Urbanecm: EU SWAT done
  • 11:33 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: 8caf681: Dont log missing ETags when creating a new page, thats normal (T233320) (duration: 00m 54s)
  • 11:33 effie: Disable puppet on mw* for 545652 - T229792
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dd2f06c: Add Translate channel for the Translate extension (T221119) (duration: 00m 53s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ff17666: Adjust wgUploadNavigationUrl for azwiki to point to commons UpWiz (T236307) (duration: 00m 53s)
  • 11:05 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 7e26ef4: Revert "Restrict uploads on azwiki" (T236307) (duration: 00m 53s)
  • 11:02 moritzm: installing OpenJDK security updates on elastic*
  • 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 08:48 godog: bump udp_localhost kafka-logging topics to 6 partitions and roll-restart logstash and rsyslog - T215904
  • 08:26 volans: manually cleanup changes reverted in https://gerrit.wikimedia.org/r/546407 on icinga[12]001 - T222074
  • 08:25 moritzm: installing file/libmagic security updates
  • 08:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791 (duration: 13m 42s)
  • 08:15 godog: swift eqiad-prod: final weight to ms-be105[1-6] - T232367
  • 08:02 mobrovac@deploy1001: Started deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791
  • 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389 (duration: 13m 44s)
  • 07:40 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt (duration: 00m 05s)
  • 07:40 elukey@deploy1001: Started deploy [eventlogging/analytics@0f1ad6d]: Move codebase to Python3 - second attempt
  • 07:37 elukey: upload archiva 2.2.4-1 to wikimedia-stretch (fix to avoid overriding archiva.xml upon install)
  • 07:27 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389
  • 07:25 mobrovac@deploy1001: Finished deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org (duration: 02m 37s)
  • 07:22 mobrovac@deploy1001: Started deploy [restbase/deploy@c500d7a] (dev-cluster): Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org

2019-10-26

  • 11:30 XioNoX: restart cr3-esams
  • 11:01 XioNoX: re0.cr3-esams> request chassis routing-engine master switch

2019-10-25

  • 22:55 mutante: moscovium rm /dev/shm/envoy_shared_memory_0 to revive envoy which failed to run after changing ports and reinstalling it (T180641)
  • 22:42 mutante: moscovium - manually deleting envoy listener on 1443 and letting puppet recreate config because it's not removed if you change the port (T180641)
  • 21:55 mutante: running puppet on ulsfo cp-ats servers to pick up config change for RT backend
  • 20:42 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes (duration: 00m 06s)
  • 20:41 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: deploying design/style-guide for demonstration purposes
  • 20:04 twentyafterfour@deploy1001: Finished deploy [design/style-guide@c69242e]: test deploy design/style-guide (duration: 00m 10s)
  • 20:04 twentyafterfour@deploy1001: Started deploy [design/style-guide@c69242e]: test deploy design/style-guide
  • 17:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 bblack: lvs3005 - reimaging to fix partman issue, high-traffic1 (text) to lvs3007 for the duration
  • 16:43 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:19 bblack: lvs3006 - reimaging to fix partman issue, high-traffic2 (upload/maps) to lvs3007 for the duration
  • 16:19 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292 (duration: 13m 31s)
  • 16:05 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox1001) T223292
  • 16:04 crusnov@deploy1001: Finished deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292 (duration: 00m 43s)
  • 16:04 crusnov@deploy1001: Started deploy [netbox/deploy@0f4c92d]: deploy netbox scripts update (netbox2001) T223292
  • 15:35 robh: ps1-oe14-esams ip info set, rebooting (wont affect servers) via T184066
  • 15:03 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 15:01 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:00 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 bblack: cr[23]-esams: re-route ns2 IP to ganeti3003
  • 14:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292 (duration: 00m 44s)
  • 14:31 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) -T223292
  • 14:30 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292 (duration: 00m 05s)
  • 14:30 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts (netbox2001) T223292
  • 14:28 crusnov@deploy1001: Finished deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292 (duration: 01m 02s)
  • 14:27 crusnov@deploy1001: Started deploy [netbox/deploy@690f9ae]: deploy netbox scripts T223292
  • 14:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:10 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:09 bblack: reboot ganeti3003
  • 13:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 ema: pool cp4032 with ATS backend T227432
  • 13:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 effie: depool mw1334 and pool back
  • 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:28 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4032.ulsfo.wmnet,service=ats-be
  • 13:05 ema: depool cp4032 and reimage as text_ats T227432
  • 12:34 jynus: introducing new freshnesh check for bacula T234900
  • 12:11 ema: pool cp4031 with ATS backend T227432
  • 10:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:59 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4031.ulsfo.wmnet,service=ats-be
  • 09:56 ema: depool cp4031 and reimage as text_ats T227432
  • 09:39 ema: pool cp4030 with ATS backend T227432
  • 09:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 XioNoX: powering off mr1-esams again
  • 09:20 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 XioNoX: going to power down mr1-esams (esams mgmt is going to go down) for 30min the time to move power cables
  • 09:02 jynus: disabling persistent journald on db1074
  • 09:01 ema@cumin1001: conftool action : set/weight=100; selector: name=cp4030.ulsfo.wmnet,service=ats-be
  • 08:58 ema: depool cp4030 and reimage as text_ats T227432
  • 08:48 vgutierrez: switch from nginx to ats-tls on cp3050 - T231627
  • 08:45 godog: stop prometheus on bast300[24] and done last round of rsync data - T236329
  • 08:37 ema: lvs1015: restart pybal to add labweb-ssl T210411
  • 08:36 ema: test
  • 08:34 ema@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
  • 08:32 ema: lvs1016: restart pybal to add labweb-ssl T210411
  • 08:02 vgutierrez: rolling restart of ats-tls to introduce a SSL handshake timeout of 60 secs - T236458
  • 07:35 akosiaris: reboot webperf1002 for disk resize T235455
  • 07:29 akosiaris: reboot webperf2002 for disk resize T235455
  • 05:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:35 vgutierrez: reimage lvs3007 to let it get the proper partman configuration - T236294
  • 05:03 vgutierrez: Applying a SSL handshake timeout of 60 secs on ats-tls/cp5007 - T236458
  • 04:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:55 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:53 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:52 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:50 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 04:49 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:24 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns300.*
  • 03:24 bblack@cumin1001: conftool action : set/weight=1; selector: name=dns3001.*
  • 03:08 bblack: cr2-esams + cr3-esams : remove nescio and maerlant from anycast4 neighbor list
  • 03:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 03:05 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3049.esams.wmnet
  • 02:45 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 02:44 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3043.esams.wmnet
  • 02:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 02:09 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 01:52 bblack: mr1-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:50 bblack: asw2-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:46 bblack: cr2-esams + cr3-esams: switch ntp peers list to use dns300[12] instead of nescio/maerlant
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
  • 01:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 01:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3047.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet
  • 01:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3046.esams.wmnet
  • 01:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
  • 01:13 mutante: puppetmaster1001 - revoking parsoid.svc.eqiad / parsoid.svc.codfw / parsoid.discovery.wmnet certificates and creating new ones including parsoid-php.discovery.wmnet (T233654)
  • 00:52 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/LiquidThreads/classes/View.php: (no justification provided) (duration: 00m 54s)

2019-10-24

  • 23:46 mutante: bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329)
  • 23:24 krinkle@deploy1001: Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s)
  • 22:16 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:13 mutante: gerrit1001 - starting gerrit
  • 22:13 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 thcipriani: stopping gerrit briefly for script run for T236344
  • 22:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:01 mutante: mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792)
  • 21:56 eileen: civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1
  • 21:16 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
  • 21:16 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet
  • 21:13 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
  • 21:12 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet
  • 21:06 bblack: cr3-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 bblack: cr2-esams remove pybal neighbor IPs for lvs3001-4
  • 21:05 urandom: restbase cassandra rolling restart, codfw / rack 'd' -- T200803
  • 21:02 bblack: downtimed lvs3001-4, stopping pybal there, etc...
  • 20:58 bblack: cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:58 bblack: cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005
  • 20:40 bblack: esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently)
  • 20:36 bblack: esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again)
  • 20:33 urandom: restbase cassandra rolling restart, codfw / rack 'c' -- T200803
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet
  • 20:24 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
  • 20:23 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet
  • 20:22 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 20:04 bblack: reboot cp3054 again for good measure
  • 19:57 bblack: cp3054 - trying racadm serveraction hardreset
  • 19:32 bblack: reboot dns3001
  • 19:31 urandom: restbase cassandra rolling restart, codfw / rack 'b' -- T200803
  • 19:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:06 urandom: restbase cassandra rolling restart, rack 'd' -- T200803
  • 19:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:01 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:00 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:59 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:57 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:55 Urbanecm: Morning SWAT done
  • 18:55 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 urandom: restbase cassandra rolling restart, rack 'b' -- T200803
  • 18:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:31 bblack: cr3-esams: add dns3001 to anycast4 neighbors
  • 18:30 bblack: cr2-esams: add dns3001 to anycast4 neighbors
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 263fd0f: Enable Wikibase client access on commonswiki (T223792) (duration: 00m 52s)
  • 18:25 urandom: restbase cassandra rolling restart, rack 'a' -- T200803
  • 18:22 robh: completing ps1-b6-eqiad setup, pdu will reboot twice, power output unaffected T227540
  • 18:20 robh: ps1-a6-eqiad setup complete, icinga errors should clear up T227142
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 84c48df: rename service definition (T222851) (duration: 00m 53s)
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b20d6de: Reference Previews: full beta deployment (T235083) (duration: 00m 52s)
  • 18:03 robh: setting ip info for ps1-a6-eqiad, it is rebooting. T227142
  • 17:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:38 ema: pool cp3059 (cache_upload) T233242
  • 17:29 bblack: asw2-esams - committing switch port/vlan config for new rack 14 hosts
  • 17:26 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable Parsoid/PHP in the whole wtp (a.k.a. Parsoid) cluster - T236388 (duration: 00m 53s)
  • 17:18 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:54 ema: depool cp3036 (cache_upload) T233242
  • 16:39 urandom: restarting cassandra, restbase2011 (canary for config changes) -- T200803
  • 16:32 urandom: restarting cassandra, restbase1016 (canary for config changes) -- T200803
  • 16:28 ema: depool cp3035 (cache_upload) T233242
  • 16:07 ema: pool cp3057 (cache_upload) T233242
  • 15:51 ema: depool cp3032 (cache_text) T233242
  • 15:45 ema: depool cp3034 (cache_upload) T233242
  • 15:40 ema: depool cp3030 (cache_text) T233242
  • 15:27 bblack: asw2-esams: configure port descriptions and vlan/lvs groupings for all rack16 hosts (lvs3007, ganeti3003, bast3004, cp3061-5)
  • 15:19 ema: pool cp3058 (cache_text) T233242
  • 15:18 effie: Slowly reload apache across the fleet (as we are enabling puppet) - T229792
  • 15:09 effie: Remove hhvm packages and enable puppet across the fleet - T229792
  • 15:09 ema: pool cp3055 (cache_upload) T233242
  • 15:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testcommonswiki, Enable Wikibase client access T223792 (duration: 00m 53s)
  • 15:00 bblack: cr2-esams - add missing lvs3005 IP to bgp pybal neighbor list
  • 14:58 bblack: cr3-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:58 bblack: cr2-esams - change fallback static route for high-traffic2 to lvs3006
  • 14:47 effie: run puppet on all canaries and codfw - T229792
  • 14:42 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:40 effie: Remove hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from all canaries and codfw - T229792
  • 14:40 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:26 bblack: lvs3006 (upload, becoming active) - manual pybal med s/90/0/ (will take over from lvs3002, intended permanently).
  • 14:23 bblack: lvs3006 (upload, inactive) - manual pybal med s/100/90/ (preferred to lvs3004 for fallback from lvs3002)
  • 14:22 effie: enable puppet on mw app canaries
  • 14:16 ema: power-cycle cp3056, stuck rebooting into d-i T233242
  • 13:59 ema: pool cp3060 T233242
  • 13:36 bblack: re-pooling esams in dns
  • 13:34 effie: enable puppet on mwdebug*
  • 13:25 XioNoX: enable transit4/6 on cr2-knams
  • 13:24 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=varnish-be,name=cp30[56].*
  • 13:24 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp30[56].*,service=varnish-be
  • 13:23 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=nginx
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=varnish-fe
  • 13:22 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=nginx
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3063.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3051.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3059.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3061.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3057.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3065.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3055.esams.wmnet
  • 13:18 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,name=cp3053.esams.wmnet
  • 13:17 ema: set ats-be weights on new esams upload nodes T233242
  • 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.3
  • 12:56 effie: purge hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from mw* canaries - T229792
  • 12:42 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp3060.esams.wmnet,service=varnish-be
  • 12:33 effie: Stopping puppet on all hosts including the hhvm class (C:hhvm) - 544864 - T229792
  • 12:25 ema: cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242
  • 12:14 bblack: depool esams in geodns
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 after analyze table', diff saved to https://phabricator.wikimedia.org/P9468 and previous config saved to /var/cache/conftool/dbconfig/20191024-120812-marostegui.json
  • 12:06 XioNoX: shutdown cr1-esams - cr2-knams link
  • 12:00 XioNoX: shutdown transit BGP sessions on cr2-knams
  • 11:40 Urbanecm: EU SWAT done
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3a5cb68: Permission changes of move-rootuserpages assignment at commonswiki (T236359) (duration: 01m 00s)
  • 11:33 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:31 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 Urbanecm: Run mwscript namespaceDupes.php --wiki=commonswiki --add-prefix=FIXME --fix (T236352)
  • 11:28 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e079956: Add CAT as alias for NS_CATEGORY at commonswiki (T236352) (duration: 01m 00s)
  • 11:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 2d66deb: Restrict uploads on azwiki (T236307) (duration: 01m 03s)
  • 11:15 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/WikibaseMediaInfo: Also use custom PrefetchingTermLookup in SingleEntitySourceServices (duration: 01m 01s)
  • 11:13 mlitn@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/Wikibase: Allow defining entity-type-specific PrefetchingTermLookup (duration: 01m 06s)
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:00 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights for db1093 and db1085', diff saved to https://phabricator.wikimedia.org/P9466 and previous config saved to /var/cache/conftool/dbconfig/20191024-101810-marostegui.json
  • 09:59 hashar: Converting CI jobs to use the new PostBuildScript plugin config | https://gerrit.wikimedia.org/r/#/c/integration/config/+/544907/ | T188398
  • 09:57 hashar: Restarting CI Jenkins
  • 09:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T234853 Re-enable performance perception survey on ruwiki (duration: 01m 04s)
  • 08:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:36 godog: roll restart rsyslog in codfw/eqiad to pick up new kafka partitions
  • 08:18 godog: roll restart rsyslog in ulsfo/esams/eqsin to pick up new kafka partitions
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092 for analyze table', diff saved to https://phabricator.wikimedia.org/P9465 and previous config saved to /var/cache/conftool/dbconfig/20191024-081519-marostegui.json
  • 07:57 XioNoX: reboot mr1-esams
  • 07:42 godog: bump rsyslog- topics partitions to 6 and roll-restart logstash frontends
  • 07:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:22 XioNoX: drain Telia link on cr2-esams
  • 06:32 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid-php,name=eqiad
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9463 and previous config saved to /var/cache/conftool/dbconfig/20191024-052002-marostegui.json
  • 05:18 marostegui: Run analyze enwiki.revision on db2092 T223151
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9462 and previous config saved to /var/cache/conftool/dbconfig/20191024-045954-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from special slaves group and leave it with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9461 and previous config saved to /var/cache/conftool/dbconfig/20191024-045924-marostegui.json
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3315 after compression', diff saved to https://phabricator.wikimedia.org/P9460 and previous config saved to /var/cache/conftool/dbconfig/20191024-045544-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:55 shdubsh: temporarily turn down accept delay on fermium - T235983
  • 00:03 mutante: restarting gerrit to increase heap_size from 20G to 32G (T225166 T222391)

2019-10-23

  • 22:55 brennen@deploy1001: Synchronized php-1.35.0-wmf.3/extensions/AbuseFilter: SWAT: Unbreak filter edit form (T236286) (duration: 01m 05s)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:20 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:20 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 05s)
  • 22:19 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:15 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 01m 10s)
  • 22:14 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 22:00 twentyafterfour@deploy1001: Finished deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server) (duration: 00m 21s)
  • 22:00 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: deploy to phab1001 (currently a warm spare server)
  • 21:32 mutante: webperf1002/2002 - starting bacula-fd service that is failed after initial puppet run turning them into backup::hosts
  • 21:14 ejegg: updated Fundraising python tools from b3c7453be2 to ffc7bf764b
  • 20:37 shdubsh: restart nagios-nrpe-server on stat1007
  • 18:56 milimetric@deploy1001: Finished deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts (duration: 07m 53s)
  • 18:49 milimetric@deploy1001: Started deploy [analytics/refinery@3aaabf6]: Minor: fix two scripts
  • 18:29 mforns@deploy1001: Finished deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59 (duration: 06m 40s)
  • 18:22 mforns@deploy1001: Started deploy [analytics/refinery@1110d59]: deploying refinery up to 1110d59
  • 17:31 akosiaris: restart varnish-be on cp1089 as a response to HTTP availability alerts. High mailbox lag
  • 17:25 akosiaris: restart varnish-be on cp1081 as a response to HTTP availability alerts
  • 15:55 _joe_: restarting pybal on lvs2006, then 2003 for picking up parsoid-php
  • 15:32 marostegui: Enable slow query log 1/20 on db1089 (enwiki) T223151
  • 14:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:39 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:38 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:37 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:35 ema@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:19 bblack: repooling esams
  • 14:00 hashar: Restarting CI Jenkins
  • 13:57 _joe_: manually changing the symlinked deployed version of parsoid on wtp1025 T236275
  • 13:35 XioNoX: migrate esams mgmt to new mgmt router
  • 13:34 effie: disable puppet on mwdebug1002 - T214734
  • 13:13 ssastry@deploy1001: Finished deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues (duration: 08m 44s)
  • 13:07 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.3 (duration: 01m 00s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.3
  • 13:04 ssastry@deploy1001: Started deploy [parsoid/deploy@451db1e]: Updating Parsoid to 5521ea74; Dummy Parsoid deploy to debug Parsoid/PHP deployment issues
  • 12:37 effie: Depool mwdebug1002 - T214734
  • 12:31 vgutierrez: restarting ats-tls on cache text nodes - T233274
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from the special slaves group on s5 and leave it back with its original pooling options T223151', diff saved to https://phabricator.wikimedia.org/P9454 and previous config saved to /var/cache/conftool/dbconfig/20191023-122708-marostegui.json
  • 11:26 XioNoX: powering down cr1-esams
  • 11:24 Urbanecm: EU SWAT done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: e21054e: Add Balinese to interwiki sort orders (T234768) (duration: 01m 01s)
  • 11:18 Urbanecm: mwscript updateArticleCount.php --wiki=frwikiquote --update (T236212)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (2/2; T234278) (duration: 01m 01s)
  • 11:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright: SWAT: 0889da0: Add custom Minerva wordmark for Hebrew wikivoyage (1/2; T234278) (duration: 01m 01s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf8e2f1: Set $wgArticleCountMethod to any for frwikiquote (T236212) (duration: 01m 12s)
  • 10:46 ema: cp-ats: rolling ATS backend restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545522/ T233274
  • 10:13 jynus: reverting dbtree revision to HEAD~1 T224589
  • 10:11 jynus: deploying new version of dbtree T224589
  • 10:04 ema: cp1075: ats-backend-restart to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/545508/
  • 09:42 godog: bounce burrow-logging-eqiad.service on kafkamon1001
  • 09:40 godog: roll restart logstash to pick up new rsyslog-notice partitions
  • 09:31 godog: bump rsyslog-notice topic to 6 partitions
  • 09:00 moritzm: rebooting logstash2021 for some firmware tests
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:54 moritzm: installing systemd bugfix update on mw canaries
  • 08:50 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 godog: roll restart rsyslog on cirrus and wqds hosts to pick up changes to logback topic partitions
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312 after table compression', diff saved to https://phabricator.wikimedia.org/P9452 and previous config saved to /var/cache/conftool/dbconfig/20191023-082826-marostegui.json
  • 08:23 godog: roll restart logstash in codfw/eqiad to pick up new kafka partitions
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9451 and previous config saved to /var/cache/conftool/dbconfig/20191023-082246-marostegui.json
  • 08:11 godog: kafka-logging eqiad set 12 partitions for ^mwlog- ^logback- and eqiad.client.error topics
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s8 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9450 and previous config saved to /var/cache/conftool/dbconfig/20191023-080857-marostegui.json
  • 07:55 godog: kafka-logging delete unused topic syslog-notice
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9449 and previous config saved to /var/cache/conftool/dbconfig/20191023-075106-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s7 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9448 and previous config saved to /var/cache/conftool/dbconfig/20191023-074828-marostegui.json
  • 07:46 XioNoX: powering down cr2-esams for relocation (for real this time)
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9447 and previous config saved to /var/cache/conftool/dbconfig/20191023-073831-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s6 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9446 and previous config saved to /var/cache/conftool/dbconfig/20191023-073556-marostegui.json
  • 07:30 XioNoX: powering down cr2-esams for relocation
  • 07:28 hashar: logstash: refreshing index fields for logstash-* indices (via https://logstash.wikimedia.org/app/kibana#/management/kibana/indices/logstash-* ) # T234564
  • 07:05 XioNoX: redirect ns2 to eqiad - T235805
  • 07:04 marostegui: Enable slow query log 1/10 on db1089 (enwiki) T223151
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:59 XioNoX: depool esams - T235805
  • 06:57 effie: Depooling mw1317
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:38 marostegui: Compress tables on db1097:3315 T235599
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9445 and previous config saved to /var/cache/conftool/dbconfig/20191023-063800-marostegui.json
  • 05:29 ema@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kibana,name=codfw
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9444 and previous config saved to /var/cache/conftool/dbconfig/20191023-052940-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9443 and previous config saved to /var/cache/conftool/dbconfig/20191023-050812-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9442 and previous config saved to /var/cache/conftool/dbconfig/20191023-045722-marostegui.json
  • 04:49 vgutierrez: repool cp5007 - T234887
  • 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3315 after maintenance maintenance', diff saved to https://phabricator.wikimedia.org/P9441 and previous config saved to /var/cache/conftool/dbconfig/20191023-044833-marostegui.json
  • 04:36 MaxSem: Fixed a page title via namespaceDupes.php on pswiki
  • 03:51 vgutierrez: depool cp5007 - T234887

2019-10-22

  • 23:57 maxsem@deploy1001: Synchronized php-1.35.0-wmf.3/includes/block/DatabaseBlock.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/545373/ (duration: 00m 59s)
  • 23:53 maxsem@deploy1001: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543943/ (duration: 01m 01s)
  • 23:43 maxsem@deploy1001: Synchronized dblists/: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 00m 59s)
  • 23:41 maxsem@deploy1001: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 01s)
  • 23:38 maxsem@deploy1001: Synchronized dblists/labtestwiki.dblist: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 02s)
  • 23:32 mutante: LDAP - added keepit-ssh to wmf group (T236209)
  • 22:23 ejegg: updated Fundraising CiviCRM from ff69d64ad4 to 47e0800001
  • 21:57 thcipriani: stopping gerrit to run ref-update script T236114
  • 21:57 thcipriani: stopping gerrit to run ref-update script
  • 21:45 mutante: LDAP - added lexnasser to nda group (T235688)
  • 21:07 eileen: process-control config revision is 95ee1bafb3 dedupe job re-enabled
  • 20:09 mutante: gerrit1001 - mkdir /srv/gerrit/cobalt/git - rsyncing /srv/gerrit/git from cobalt to /srv/gerrit/cobalt/git/ on gerrit1001 (T236114)
  • 19:42 hashar: gerrit1001: apt install colordiff # T236114
  • 19:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.3
  • 19:03 brennen: proceeding with train for 1.35.0-wmf.3
  • 18:09 mutante: DNS - added new Wikipedia language "mnw" (Mon) T235739 - a language spoken in Myanmar
  • 17:59 sbassett: Uploaded and applied (but did not deploy per releng) security fix for T234450 to wmf.3
  • 17:57 sbassett: Deployed security fix for T234450 to wmf.2
  • 17:57 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213) (duration: 05m 14s)
  • 17:54 mutante: restarting gerrit to disable jgit gc (T236114)
  • 17:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213)
  • 17:37 arlolra: Updated Parsoid to cf01d91 (T234057, T234768, T235296, T235684, T235563)
  • 17:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91 (duration: 07m 37s)
  • 17:20 bblack: geodns: re-pooling esams (at this point, we're entirely back in our "normal" state of affairs)
  • 17:19 arlolra@deploy1001: Started deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91
  • 16:51 bblack: geodns: moving all "normal" eqiad traffic back to eqiad (in addition to the esams-diverted traffic which is still pointed mostly at eqiad right now)
  • 16:21 mutante: running puppet on deployment servers
  • 16:20 thcipriani: restarting gerrit
  • 16:14 thcipriani: stopping gerrit to run a fix for T222391
  • 15:58 bblack: depooling esams temporarily to test traffic scenario on lvs1014
  • 15:47 bblack: enable pybal+puppet on rebooted lvs1014
  • 15:40 bblack: rebooting lvs1014
  • 15:28 liw@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache (duration: 37m 39s)
  • 15:26 XioNoX: repool esams
  • 15:20 XioNoX: rollback ns2 redirect
  • 15:13 bblack: re-disabling lvs1014 ...
  • 15:10 bblack: re-enabling lvs1014 pybal/puppet
  • 15:03 moritzm: rebooting kafka-main1005 for microcode debugging
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:52 bblack: stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016)
  • 14:50 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache
  • 14:45 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0 (duration: 02m 44s)
  • 14:42 mbsantos@deploy1001: Started deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0
  • 14:13 XioNoX: restart asw-esams for onsite work
  • 13:52 andrewbogott: restarted slapd on ldap-eqiad-replica01
  • 13:38 gehel: silencing LVS check for katotherian (we know there is an issue) - T236163
  • 13:35 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_2419219323" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 06m 40s)
  • 13:28 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.3 and rebuild l10n cache
  • 13:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 XioNoX: depool esams for onsite work - T235805
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1096:3316 db1105:3311 db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9434 and previous config saved to /var/cache/conftool/dbconfig/20191022-130556-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9433 and previous config saved to /var/cache/conftool/dbconfig/20191022-125435-marostegui.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9432 and previous config saved to /var/cache/conftool/dbconfig/20191022-124607-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1096:3316 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9431 and previous config saved to /var/cache/conftool/dbconfig/20191022-123757-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3312 and db1105:3311 after on-site maintenance T235877', diff saved to https://phabricator.wikimedia.org/P9430 and previous config saved to /var/cache/conftool/dbconfig/20191022-123257-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315', diff saved to https://phabricator.wikimedia.org/P9429 and previous config saved to /var/cache/conftool/dbconfig/20191022-123032-marostegui.json
  • 12:29 moritzm: rebooting miscweb2001 for some microcode tests
  • 12:28 marostegui: Compress db1096:3315
  • 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 after PDU maintenance T227142 (duration: 00m 50s)
  • 12:15 jynus: reimage to buster dbmonitor2001.wikimedia.org T224589
  • 11:57 liw: starting to cut branch for train 1.35-wmf.3
  • 11:51 hashar: Restarted CI Jenkins on contint1001
  • 11:35 marostegui: Stop MySQL on db1105:3311, db1105:3312 for firmware upgrade - T235877
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312 for firmware upgrade T235877', diff saved to https://phabricator.wikimedia.org/P9428 and previous config saved to /var/cache/conftool/dbconfig/20191022-113437-marostegui.json
  • 11:29 Urbanecm: EU SWAT done
  • 11:28 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor/: SWAT: 2bc4420 (T235707); 680a98b (T233320); d83265d (T234564) (duration: 00m 53s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0593f34: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections (T230614) (duration: 00m 54s)
  • 10:55 moritzm: rebooting rpki2001 for some microcode tests
  • 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:37 ema@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kibana
  • 10:32 jynus: shutting down db1115 in preparation for PDU maintanance, this will make tendril and dbtree unavailable for 2 hours T227142
  • 10:21 ema: lvs2003: restart pybal to add new service kibana-ssl T210411
  • 10:18 ema: lvs1015: restart pybal to add new service kibana-ssl T210411
  • 10:14 ema: puppetmaster1001: rm /var/run/confd-template/.kibana-ssl*.err to make confd icinga check happy T210411
  • 10:02 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=kibana-ssl
  • 09:54 ema: lvs2006: restart pybal to add new service kibana-ssl T210411
  • 09:54 ema: lvs1016: restart pybal to add new service kibana-ssl T210411
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9425 and previous config saved to /var/cache/conftool/dbconfig/20191022-091327-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights to x100 on s4 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9424 and previous config saved to /var/cache/conftool/dbconfig/20191022-091051-marostegui.json
  • 08:05 marostegui: Stop MySQL on labsdb1012 for PDU work T227142
  • 07:53 marostegui: Stop MySQL on db1116 pc1007 db1096:3315, db1096:3316 for PDU maintenance T227142
  • 07:18 moritzm: installing tcpdump security updates
  • 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1010 T227142 (duration: 00m 52s)
  • 06:32 vgutierrez: rolling restart of ats-tls - T233274 T234803
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9423 and previous config saved to /var/cache/conftool/dbconfig/20191022-055151-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1070 from config T235464', diff saved to https://phabricator.wikimedia.org/P9422 and previous config saved to /var/cache/conftool/dbconfig/20191022-054759-marostegui.json
  • 05:41 marostegui: Stop mysql on db1070 - T235464
  • 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1070 from config T235464 (duration: 00m 51s)
  • 05:40 marostegui: Remove db1070 from tendril and zarcillo - T235464
  • 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1070 from config T235464 (duration: 00m 53s)
  • 05:33 vgutierrez: Switch from nginx to ats-tls on cp1090 - T231433
  • 05:24 vgutierrez: repooling cp2025 - T231433
  • 05:20 vgutierrez: depooling cp2025 to fix ATS/nginx configuration - T231433
  • 05:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:08 vgutierrez: Switch from nginx to ats-tls on cp1088 - T231433
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9421 and previous config saved to /var/cache/conftool/dbconfig/20191022-050204-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9420 and previous config saved to /var/cache/conftool/dbconfig/20191022-050048-marostegui.json
  • 04:58 vgutierrez: Switch from nginx to ats-tls on cp2026 - T231433
  • 04:30 vgutierrez: Switch from nginx to ats-tls on cp2024 - T231433
  • 04:18 vgutierrez: Switch from nginx to ats-tls on cp3049 - T231433
  • 03:44 vgutierrez: Switch from nginx to ats-tls on cp3047 - T231433
  • 01:12 eileen: disabled dedupe job pending T236096 deploy
  • 01:12 eileen: process-control config revision is 782a14c7d9

2019-10-21

  • 23:15 thcipriani: ops/puppet:sudo -u gerrit2 git update-ref refs/changes/66/535966/meta d6909e0 && sudo -u gerrit2 git update-ref refs/changes/66/535966/meta 8494c28 on gerrit1001
  • 23:11 mutante: rsynced operations/puppet.git/objects from cobalt to gerrit1001 (and backup in /root) (T222391)
  • 22:23 mutante: mw1340 - restarting php7.2-fpm, restarting apache2
  • 21:27 mutante: gerrit1001 manually running command from "list_mediawiki_extensions" cron (T222391)
  • 21:26 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•”๐Ÿบ sudo cumin -b 30 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 21:23 thcipriani: ssh -p 29418 gerrit.wikimedia.org -- gerrit index start changes --force
  • 21:21 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2, ran puppet again. gerrit back up (T222391)
  • 21:18 mutante: copied apache config for gerrit.wm.org site from cobalt to gerrit1001, restarted apache2
  • 21:16 cdanis: previous cumin invocation was to unblock gerrit migration; will be automatically restored to usual on next puppet run. T222391
  • 21:12 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•”๐Ÿบ sudo cumin A:dns-auth 'perl -p -i".bak" -e "s/gerrit\./gerrit-replica./" /etc/wikimedia-authdns.conf'
  • 20:57 mutante: running puppet on gerrit1001
  • 20:57 thcipriani: running puppet on cobalt
  • 20:52 mutante: rsyncing gerrit-data/plugins and /var/lib/gerrit2/review_site/ again
  • 20:51 mutante: rsyncing gerrit-data/git again
  • 20:50 thcipriani: stopping gerrit on cobalt
  • 20:44 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 52s)
  • 20:37 mutante: disabled puppet on cobalt and gerrit2001
  • 20:29 mutante: running puppet on dbproxy10017 to apply ferm change for gerrit db from gerrit1001 (T222391)
  • 20:25 mutante: gerrit1001 - puppet agent disabled - gerrit service stopped
  • 20:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f (duration: 06m 02s)
  • 20:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0c6d34b]: Update mobileapps to d6a6e7f
  • 20:12 mutante: rsyncing /var/lib/gerrit2/review_site from cobalt to gerrit1001 (T222391)
  • 20:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/545027/ T235949 (duration: 00m 52s)
  • 20:08 mutante: rsynced /srv/gerrit/plugins from cobalt to gerrit1001 (T222391)
  • 20:08 mutante: rsynced /srv/gerrit/git from cobalt to gerrit1001 (T222391)
  • 18:43 Urbanecm: Morning SWAT done
  • 18:41 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/VisualEditor: SWAT: a4ab456: TreeModifier: Ignore removed nodes properly when normalizing from a text node (T235959); ecb4532: Update VE core submodule to a4ab456dc0 (T235959); a850cee: ApiVisualEditor: Always return etag with content (T233320) (duration: 00m 55s)
  • 18:32 robh: ps1-23-ulsfo back online, all pdu work in ulsfo is now complete T235911
  • 18:30 robh: ps1-22-ulsfo repaired (reseating its NIC rebooted its mgmt interface) Done with it and repeating on ps1-23-ulsfo via T235911
  • 18:24 robh: working on ps1-22-ulsfo via T235911 (it may flap but it is already ack'd as down in icinga, but not persistent)
  • 17:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@75c0577]: GUI Updates (duration: 11m 37s)
  • 17:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/: Update VisualEditor for set of back-ports in wmf.1 T233320, T234564, T235959 (duration: 00m 56s)
  • 17:01 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@75c0577]: GUI Updates
  • 14:16 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.2 refs T233850
  • 13:46 Urbanecm: Deploy sec patch for T104807
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3314 and db2091:3312 for table compression', diff saved to https://phabricator.wikimedia.org/P9412 and previous config saved to /var/cache/conftool/dbconfig/20191021-132633-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9411 and previous config saved to /var/cache/conftool/dbconfig/20191021-132440-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights 1/2 to 100/200 on s2 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9410 and previous config saved to /var/cache/conftool/dbconfig/20191021-132145-marostegui.json
  • 13:07 ema: lvs1015: restart pybal to add new service wdqs-ssl T210411
  • 13:04 marostegui: Deploy schema change on db1122 (s2 primary master) - T233135 T234066
  • 13:04 ema: lvs2003: restart pybal to add new service wdqs-ssl T210411
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312 after schema change and remove db1129 from vslow and dump as it was was there temporarily', diff saved to https://phabricator.wikimedia.org/P9409 and previous config saved to /var/cache/conftool/dbconfig/20191021-130355-marostegui.json
  • 13:02 ema: lvs1016: restart pybal to add new service wdqs-ssl T210411
  • 13:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wdqs-ssl
  • 12:58 ema: lvs2006: restart pybal to add new service wdqs-ssl T210411
  • 12:38 hashar: Started zuul-merger on contint2001
  • 12:32 hashar: Stopped zuul-merger on contint2001
  • 12:31 hashar: Started zuul-merger on contint1001
  • 12:16 hashar: Stopped zuul-merger on contint1001
  • 12:02 Urbanecm: EU SWAT finally done
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e8d70c1: Partial cleanup of InitialiseSettings (T231178) (duration: 01m 00s)
  • 12:00 Urbanecm: I'm going to do one last sync for EU SWAT
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 12e3549: Create Portal namespace for sawikisource (T235343) (duration: 00m 59s)
  • 11:55 urbanecm@deploy1001: sync-file aborted: SWAT: 12e3549: Create Portal namespace for sawikisource (duration: 00m 01s)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3b1350b: wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there (T235904) (duration: 00m 59s)
  • 11:49 Urbanecm: Reopen EU SWAT
  • 11:42 awight: EU SWAT complete
  • 11:42 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Put reference previews back into beta mode on beta cluster (T233813) (duration: 01m 00s)
  • 11:38 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 543764|Enable ContentTranslation out of Beta in Malayalam/Bengali/Mongolian WPs (T233008, T233009, T234317) (duration: 01m 00s)
  • 11:34 moritzm: installing Java security updates on restbase-dev1004
  • 11:30 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/tests/phpunit/includes/Storage/SqlBlobStoreTest.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 3/3 - T235188 (duration: 01m 00s)
  • 11:28 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/libs/objectcache/wancache/WANObjectCache.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 2/3 - T235188 (duration: 00m 59s)
  • 11:25 mobrovac@deploy1001: Synchronized php-1.35.0-wmf.2/includes/Storage/SqlBlobStore.php: SqlBlobStore HOT FIX: remove caching from getBlobBatch; file 1/3 - T235188 (duration: 01m 00s)
  • 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:19 hashar: contint1001 / contint2001 : marking integration/config zuul merger repo readonly: sudo chown -R root:root /srv/zuul/git/integration/config
  • 10:13 hashar: CI in trouble due to a huge number of changes
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:51 Amir1: maintenance script is done
  • 09:35 moritzm: removing PHP 7.0 from deployment servers
  • 09:20 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T234774)
  • 09:18 moritzm: installing php7.0 security updates
  • 09:11 moritzm: installing subversion updates on Stretch (fixes compatibility with security fix for Apache update)
  • 09:07 moritzm: installing jackson-databind security updates
  • 09:01 moritzm: installing openjpeg2 security updates
  • 08:52 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/544209
  • 08:34 Urbanecm: Deploy security patch (T234862)
  • 08:34 vgutierrez: Switch from nginx to ats-tls on cp2022 - T231627
  • 08:30 ema: pool cp4029 with ATS backend T227432
  • 08:20 vgutierrez: Switch from nginx to ats-tls on cp2020 - T231627
  • 08:09 vgutierrez: Switch from nginx to ats-tls on cp2018 - T231627
  • 08:08 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 08:03 godog: swift codfw-prod: final weight to ms-be205[1-6] - T233638
  • 07:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:57 vgutierrez: Switch from nginx to ats-tls on cp3046 - T231627
  • 07:57 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:50 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4029.ulsfo.wmnet,service=ats-be
  • 07:45 moritzm: installing aspell security updates on jessie
  • 07:43 vgutierrez: Switch from nginx to ats-tls on cp3045 - T231627
  • 07:35 moritzm: installing openjdk-11 security updates
  • 07:32 ema: depool cp4029 and reimage as text_ats T227432
  • 07:15 vgutierrez: Switch from nginx to ats-tls on cp1075 - T231627
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool non partitioned db1089 into s1 special slaves to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9406 and previous config saved to /var/cache/conftool/dbconfig/20191021-070655-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9405 and previous config saved to /var/cache/conftool/dbconfig/20191021-070352-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights from 1 to 100 on s1 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9404 and previous config saved to /var/cache/conftool/dbconfig/20191021-070119-marostegui.json
  • 06:59 vgutierrez: Switch from nginx to ats-tls on cp2001 - T231627
  • 06:46 vgutierrez: Switch from nginx to ats-tls on cp3030 - T231627
  • 06:28 vgutierrez: Install python3-cryptography-2.6.1-3+deb10u2 on acme-chief hosts - T234131
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9403 and previous config saved to /var/cache/conftool/dbconfig/20191021-061518-marostegui.json
  • 06:12 vgutierrez: Switch cp1086 from nginx to ats-tls - T231433
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1130 on s5 to check for slow queries T223151', diff saved to https://phabricator.wikimedia.org/P9402 and previous config saved to /var/cache/conftool/dbconfig/20191021-055843-marostegui.json
  • 05:54 vgutierrez: Switch cp2017 from nginx to ats-tls - T231433
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9401 and previous config saved to /var/cache/conftool/dbconfig/20191021-055017-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2048 and db2061, those hosts will be decommissioned T228258', diff saved to https://phabricator.wikimedia.org/P9400 and previous config saved to /var/cache/conftool/dbconfig/20191021-054340-marostegui.json
  • 05:42 _joe_: slowly removing service objects from production etcd T233973
  • 05:38 vgutierrez: Switch cp3044 from nginx to ats-tls - T231433
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1105:3311', diff saved to https://phabricator.wikimedia.org/P9399 and previous config saved to /var/cache/conftool/dbconfig/20191021-053737-marostegui.json
  • 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui: Compress tables on db2084:3314 db2091:3312 - T235599
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P9398 and previous config saved to /var/cache/conftool/dbconfig/20191021-052643-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312 db2084:3315 - T235599', diff saved to https://phabricator.wikimedia.org/P9397 and previous config saved to /var/cache/conftool/dbconfig/20191021-052527-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P9396 and previous config saved to /var/cache/conftool/dbconfig/20191021-052035-marostegui.json
  • 05:19 vgutierrez: Switch cp4026 from nginx to ats-tls - T231433
  • 05:14 marostegui: Deploy schema change on db1090:3312 T234066 T233135
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312 for schema change and pool db1129 temporarily in vslow, dump', diff saved to https://phabricator.wikimedia.org/P9395 and previous config saved to /var/cache/conftool/dbconfig/20191021-051356-marostegui.json
  • 05:09 marostegui: Deploy schema change on s7 primary master db1062 - T234066 T233135
  • 04:57 vgutierrez: Switch cp5006 from nginx to ats-tls - T231433

2019-10-19

  • 08:41 XioNoX: add user papaul to fasw-c-eqiad
  • 00:05 mutante: LDAP - adding verenali to wmde and nda groups, to match raja_wmde (T233807, T231677)

2019-10-18

  • 22:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet,service=parsoid-php
  • 22:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet,service=parsoid-php
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet,service=parsoid-php
  • 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet,service=parsoid-php
  • 22:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet,service=parsoid-php
  • 22:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet,service=parsoid-php
  • 22:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet,service=parsoid-php
  • 22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet,service=parsoid-php
  • 22:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet,service=parsoid-php
  • 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet,service=parsoid-php
  • 22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet,service=parsoid-php
  • 22:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet,service=parsoid-php
  • 22:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2032.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet,service=parsoid-php
  • 22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet,service=parsoid-php
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet,service=parsoid-php
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet,service=parsoid-php
  • 21:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet,service=parsoid-php
  • 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet,service=parsoid-php
  • 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet,service=parsoid-php
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet,service=parsoid-php
  • 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet,service=parsoid-php
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet,service=parsoid-php
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
  • 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet,service=parsoid-php
  • 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet,service=parsoid-php
  • 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet,service=parsoid-php
  • 20:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet,service=parsoid-php
  • 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet,service=parsoid-php
  • 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet,service=parsoid-php
  • 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet,service=parsoid-php
  • 19:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet,service=parsoid-php
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet,service=parsoid-php
  • 18:27 mutante: temp. disabled puppet on all wtp* servers, adding mediawiki appserver roles on them incrementally by re-enabling puppet, starting with wtp1026, scheduled icinga downtime for wtp* all services (T233654)
  • 18:19 mutante: temp. disabling puppet on all wtp* servers
  • 15:40 Urbanecm: Reassign edits from DannyS712 (T235446) to DannyS712 at banwiki (T235446)
  • 15:38 Urbanecm: Run extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=banwiki DannyS712 (T235446)
  • 15:38 Urbanecm: Rename DannyS712@banwiki to DannyS712 (T235446) locally (T235446)
  • 15:07 Urbanecm: Reattach DannyS712@banwiki to DannyS712@SUL (T235446)
  • 14:19 _joe_: uploading cassandra 3.11.4 to stretch-wikimedia
  • 14:10 marostegui: Run compare.py on db1105 - T235877
  • 13:48 jynus: disabled notifications on db1105
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 and db1105:3312 host rebooted itself', diff saved to https://phabricator.wikimedia.org/P9392 and previous config saved to /var/cache/conftool/dbconfig/20191018-134517-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2059 from config, host decommissioned', diff saved to https://phabricator.wikimedia.org/P9391 and previous config saved to /var/cache/conftool/dbconfig/20191018-132934-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084:3315 for tables compression T235599', diff saved to https://phabricator.wikimedia.org/P9390 and previous config saved to /var/cache/conftool/dbconfig/20191018-130253-marostegui.json
  • 13:01 marostegui: Compress db2084:3315 T235599
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P9389 and previous config saved to /var/cache/conftool/dbconfig/20191018-123930-marostegui.json
  • 12:20 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:10 jbond42: !log disable puppet on puppetmasters to fix puppet-merge
  • 11:58 moritzm: installing sudo security updates for jessie
  • 11:56 Reedy: `mwscript refreshLinks.php banwiki` on mwmaint1002 T235843
  • 11:10 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4028.ulsfo.wmnet,service=ats-be
  • 10:56 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet - T234175
  • 10:53 effie: Updating wikidiff2 to 1.9.0-2~wmf1 and slowly restart php-fpm across the fleet
  • 10:49 effie: Uploading wikidiff2_1.9.0-2~wmf1 to stretch-wikimedia T231586
  • 09:58 moritzm: rolling out debdeploy 0.0.99.12 fleet-wide
  • 09:57 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=echostore
  • 09:40 _joe_: restarting pybal on lvs1015 to pick up the addition of echostore
  • 09:37 ema: pool cp4028 with ATS backend T227432
  • 09:36 _joe_: restarting pybal on lvs2003 to pick up the addition of echostore
  • 09:34 _joe_: restarting pybal on lvs1016 to pick up the addition of echostore
  • 09:20 _joe_: restarting pybal on lvs2006 to pick up the addition of echostore
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:16 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=echostore
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:14 moritzm: importing debdeploy 0.0.99.12 to apt.wikimedia.org
  • 09:13 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 09:12 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:11 _joe_: hotpatching puppet-merge on puppetmaster1001
  • 08:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:32 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:03 ema: depool cp4028 and reimage as text_ats T227432
  • 07:58 marostegui: Deploy schema change on db1076
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P9388 and previous config saved to /var/cache/conftool/dbconfig/20191018-075709-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P9387 and previous config saved to /var/cache/conftool/dbconfig/20191018-075529-marostegui.json
  • 07:21 moritzm: installing unbound security updates on buster
  • 07:20 moritzm: installing libdatetime-timezone-perl updates (time zone updates)#
  • 05:53 vgutierrez: switch cp1084 from nginx to ats-tls - T231433
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:32 vgutierrez: switch cp2014 from nginx to ats-tls - T231433
  • 05:19 marostegui: Rename m5 labtestwiki database - T233236
  • 05:15 marostegui: Deploy schema change on db1129 T233135 T234066
  • 05:15 marostegui: Compress tables on db2091:3314 T235599
  • 05:14 vgutierrez: switch cp3039 from nginx to ats-tls - T231433
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P9386 and previous config saved to /var/cache/conftool/dbconfig/20191018-051355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 and db2086:3318 after table compression', diff saved to https://phabricator.wikimedia.org/P9385 and previous config saved to /var/cache/conftool/dbconfig/20191018-050831-marostegui.json
  • 04:57 vgutierrez: switch cp4025 from nginx to ats-tls - T231433
  • 04:34 vgutierrez: switch cp5005 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: restarting nagios-nrpe-server on stat1007

2019-10-17

  • 21:42 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673 (duration: 05m 38s)
  • 21:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d663006]: Update mobileapps to f345673
  • 19:31 eileen: civicrm revision changed from 4eac801762 to ff69d64ad4, config revision is dc3a88889d
  • 18:26 mutante: wtp1025 - cd /srv/deployment/parsoid/deploy/src ; sudo -u deploy-service ln -s ../vendor (for benchmarking test)
  • 18:01 _joe_: depooled wtp1025 from parsoid, parsoid-php to allow running benchmarks there
  • 18:01 elukey: update librdkafka on eventlog1002 and restart eventlogging
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 and remove db1136 from its temporary vslow,dump role', diff saved to https://phabricator.wikimedia.org/P9382 and previous config saved to /var/cache/conftool/dbconfig/20191017-151952-marostegui.json
  • 15:07 dcausse: unbanning elastic1050:psi
  • 15:01 dcausse: dumping jvm heap on elastic1050:psi to investigate gc issues
  • 14:46 moritzm: installing 4.9.189 Linux update on jessie hosts (no reboots, deploying the package only at this point)
  • 14:37 dcausse: banning elastic1050:psi to investigate gc issues
  • 14:32 moritzm: uploaded linux-meta 1.22 for jessie-wikimedia
  • 14:32 bblack: disable puppet on cache fleet (cp*) ahead of cert deployment refactoring - T234803
  • 14:09 cdanis: โœ”๏ธ cdanis@install1002.wikimedia.org ~ ๐Ÿ•™โ˜• sudo -E reprepro --restrict grafana update buster-wikimedia
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9381 and previous config saved to /var/cache/conftool/dbconfig/20191017-134112-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9380 and previous config saved to /var/cache/conftool/dbconfig/20191017-133047-marostegui.json
  • 13:06 XioNoX: rollback failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 12:56 XioNoX: restart mr1-eqiad
  • 12:54 XioNoX: downtiming all mgmt host for 30min (mr1-eqiad needs to be rebooted)
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9379 and previous config saved to /var/cache/conftool/dbconfig/20191017-125248-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9378 and previous config saved to /var/cache/conftool/dbconfig/20191017-125154-marostegui.json
  • 12:50 marostegui: Compress tables on db2088:3312 - T235599
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9377 and previous config saved to /var/cache/conftool/dbconfig/20191017-124503-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1090:3312 original weight', diff saved to https://phabricator.wikimedia.org/P9376 and previous config saved to /var/cache/conftool/dbconfig/20191017-121330-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9375 and previous config saved to /var/cache/conftool/dbconfig/20191017-121106-marostegui.json
  • 11:39 ema: pool cp4027 with ATS backend T227432
  • 11:36 vgutierrez: upgrading ATS on eqiad nodes to 8.0.5-1wm9 - T234011
  • 11:27 vgutierrez: upgrading ATS on codfw nodes to 8.0.5-1wm9 - T234011
  • 11:27 ema@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4027.ulsfo.wmnet,service=ats-be
  • 11:16 vgutierrez: upgrading ATS on esams nodes to 8.0.5-1wm9 - T234011
  • 11:11 Urbanecm: EU SWAT done
  • 11:11 XioNoX: failover vrrp from cr2-eqiad to cr1-eqiad - T227133
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 36d4612: Allow sysops to add transwiki on nnwiki, and add import sources (T231761) (duration: 00m 59s)
  • 11:09 vgutierrez: upgrading ATS on ulsfo nodes to 8.0.5-1wm9 - T234011
  • 11:08 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikibaseMediaInfo: SWAT: 5a67011: Keep track of assigned nodes in both old & new DOM (T235236) (duration: 01m 03s)
  • 10:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 ema@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:32 ema: depool cp4027 and reimage as text_ats T227432
  • 10:31 effie: depool mw1333
  • 10:25 elukey: rollback eventlogging back to Python 2, some errors (unseen in tests) logged by the processors
  • 10:24 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3 (duration: 00m 03s)
  • 10:24 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Rollback move codebase to Python3
  • 10:19 elukey: Move eventlogging on eventlog1002 to Python3
  • 10:17 elukey@deploy1001: Finished deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3 (duration: 00m 05s)
  • 10:17 elukey@deploy1001: Started deploy [eventlogging/analytics@0f0a1aa]: Move codebase to Python3
  • 09:57 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 09:39 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 09:38 marostegui: Stop MySQL on db1129 for PDU work
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for PDU work, give some traffic to db1090:3312 meanwhile T22meanwhile T227133', diff saved to https://phabricator.wikimedia.org/P9374 and previous config saved to /var/cache/conftool/dbconfig/20191017-093753-marostegui.json
  • 09:27 elukey: upload archiva 2.2.4-1 to stretch-wikimedia - T222595
  • 09:26 marostegui: Stop MySQL on db1117 this will generate some haproxy alerts - T227133
  • 08:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:26 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:05 vgutierrez: upgrading ATS on eqsin nodes to 8.0.5-1wm9 - T234011
  • 08:03 marostegui: Deploy schema change on db1090:3317
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db1136 weight', diff saved to https://phabricator.wikimedia.org/P9373 and previous config saved to /var/cache/conftool/dbconfig/20191017-080157-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 pool db1136 temporarily into vslow,dump', diff saved to https://phabricator.wikimedia.org/P9372 and previous config saved to /var/cache/conftool/dbconfig/20191017-080026-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P9371 and previous config saved to /var/cache/conftool/dbconfig/20191017-074658-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 (non partitioned host) into s5 special group with low weight - T223151', diff saved to https://phabricator.wikimedia.org/P9370 and previous config saved to /var/cache/conftool/dbconfig/20191017-071308-marostegui.json
  • 06:06 elukey: upgrade archiva on archiva1001 to 2.2.4 - T222595
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from x to x100 on s5 - T231018', diff saved to https://phabricator.wikimedia.org/P9369 and previous config saved to /var/cache/conftool/dbconfig/20191017-060251-marostegui.json
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui: Deploy schema change on labtestwiki and labswiki
  • 05:12 marostegui: Deploy schema change on db1095:3312
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 and db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P9368 and previous config saved to /var/cache/conftool/dbconfig/20191017-051055-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 and db1094', diff saved to https://phabricator.wikimedia.org/P9367 and previous config saved to /var/cache/conftool/dbconfig/20191017-050614-marostegui.json
  • 05:01 vgutierrez: upgrading ATS to 8.0.5-1wm9 on cp5001 - T234011
  • 05:00 vgutierrez: uploaded trafficserver 8.0.5-1wm9 to apt.wikimedia.org (stretch) - T234011
  • 02:04 bblack: repooling eqsin
  • 00:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2019-10-16

  • 23:17 Urbanecm: Evening SWAT done
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: Clean expired rules (duration: 00m 58s)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-1.5x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki-2x.png (T235710)
  • 23:14 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/azwiki.png (T235710)
  • 23:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 9c5bcd8: Change logo for azwiki (T235710) (duration: 00m 59s)
  • 23:11 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6dc4c0c: New throttle rule for WMCL editathon (T235693) (duration: 00m 59s)
  • 23:09 @: helmfile [EQIAD] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96c87c7: Enable transwiki import from other Wikipedias on srwikisource (T235419) (duration: 00m 58s)
  • 23:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 23:00 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 22:42 James_F: Zuul: Add composer-php72-docker for wikimedia-cz/web-theme and wikimedia-cz/web-plugin
  • 22:31 mutante: mwmaint1002 - running generate-fancy-captcha-loop to work around issue with generate-captcha cron (T230245)
  • 22:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/resources/src/mediawiki.special/contributions.less: T235137 Don't apply styling for Special:Contributions on other pages (duration: 00m 59s)
  • 22:29 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/OutputPage.php: T235711 Lower severity of targets violation back to DEBUG (duration: 00m 59s)
  • 21:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/WikiEditor: T235701 Revert removal of jquery.tabIndex (duration: 00m 59s)
  • 21:47 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:44 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:42 @: helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 21:41 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 21:10 @: helmfile [CODFW] Ran 'apply' command on namespace 'echostore' for release 'production' .
  • 20:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'echostore' for release 'staging' .
  • 20:41 ejegg: rolled back fundraising python tools from 31171f148c to b3c7453be2
  • 20:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.2/includes/resourceloader/ResourceLoaderStartUpModule.php: Expose StartupModule::getConfigSettings for internal use T235350 T229836 (duration: 00m 59s)
  • 20:07 joal@deploy1001: Finished deploy [analytics/refinery@1704fdd]: Regular analytics weekly train (duration: 17m 06s)
  • 20:00 urandom: upgrading Cassandra to 3.11.4, codfw, rack d -- T200803
  • 19:50 joal@deploy1001: Started deploy [analytics/refinery@1704fdd]: Regular analytics weekly train
  • 19:35 urandom: upgrading Cassandra to 3.11.4, codfw, rack c -- T200803
  • 19:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.34.0-wmf.25 (duration: 03m 24s)
  • 19:18 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix (duration: 05m 53s)
  • 19:13 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train - try 2 after fix
  • 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.2 refs T233850 (duration: 00m 59s)
  • 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.2 refs T233850
  • 19:06 joal@deploy1001: Finished deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint) (duration: 01m 18s)
  • 19:05 joal@deploy1001: Started deploy [analytics/aqs/deploy@59a97fa]: Regular analytics weekly train (top-mediarequest endpoint)
  • 18:46 urandom: upgrading Cassandra to 3.11.4, codfw, rack b -- T200803
  • 18:28 urandom: upgrading Cassandra to 3.11.4, eqiad, rack d -- T200803
  • 18:06 urandom: upgrading Cassandra to 3.11.4, eqiad, rack b -- T200803
  • 16:33 urandom: upgrading Cassandra to 3.11.4, eqiad, rack a -- T200803
  • 16:17 catrope@deploy1001: Synchronized php-1.35.0-wmf.2/extensions/GrowthExperiments/: Fix help panel button alignment (T235578) (duration: 01m 02s)
  • 16:16 mutante: ganeti1003 - shutting down and removing instance moscovium.eqiad.wmnet - recreating under same name with cookbook
  • 15:59 mutante: new dsh group parsoid_php created - parsoid-php servers added to scap / mediawiki-installation dsh group
  • 15:17 marostegui: Deploy schema change on dbstore1004:3312 - T234066 T233135
  • 15:09 marostegui: Recreate views for protected_titles on s2 and s7 on labsdb1009 and labsdb1012 - T233135
  • 15:04 mutante: wtp1025 wtp2001 - scap pull (T233654)
  • 15:04 mutante: wtp parsoid servers added to conftool - wtp1025 and wtp2001 pooled in new service parsoid-php (T233654)
  • 15:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet,service=parsoid-php
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet,service=parsoid-php
  • 14:53 effie: Remove tex* and math related packages from deploy*,mwmaint*,snapshot* - T195847
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:26 papaul: power down puppetmaster2001 for HW maintenance
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:24 oblivian@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:24 _joe_: creating namespaces and policies for echostore in codfw, T234376
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:18 oblivian@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:10 moritzm: installing idp2001
  • 13:56 jynus: reenabling puppet on helium T229209
  • 13:46 XioNoX: rollback failover VRRP from cr1-eqiad to cr2-eqiad - T226782
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 and db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P9364 and previous config saved to /var/cache/conftool/dbconfig/20191016-132620-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P9363 and previous config saved to /var/cache/conftool/dbconfig/20191016-131010-marostegui.json
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P9362 and previous config saved to /var/cache/conftool/dbconfig/20191016-125102-marostegui.json
  • 12:38 effie: remove tex* and math related packages from appserver canaries - T195847
  • 12:30 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540 (duration: 03m 40s)
  • 12:29 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:26 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: redeploy 0.3.4-SNAPSHOT - T235540
  • 12:20 marostegui: Compress tables on db1099:3311 - T235599
  • 12:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c90503b]: Revert to fix T235540 (duration: 19m 09s)
  • 12:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:00 kart_: Updated cxserver to 2019-10-15-091114-production (T234773, T217585)
  • 11:57 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:56 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c90503b]: Revert to fix T235540
  • 11:49 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT (duration: 10m 13s)
  • 11:46 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:39 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@5b42bdf]: Revert wdqs 0.3.4-SNAPSHOT
  • 11:34 Lucas_WMDE: EU SWAT done
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: extension-list: Load FlaggedRevs via extension.json (T87915, T139800, T140852) (duration: 01m 05s)
  • 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure Citoid+Wikibase integration on Test Wikidata (T228412) (duration: 01m 13s)
  • 11:14 _joe_: purging confd from wtp* servers, not needed anymore
  • 10:48 _joe_: upgrading confd to 0.16.0 across the cluster. T147204. confd will be restarted on the next puppet run
  • 10:31 elukey: upload prometheus-memcached-exporter 0.4.1+git20181010.2fa99eb-1+deb10u1 to buster-wikimedia - T213089
  • 10:17 marostegui: Stop replication on s2 codfw master for schema change and to modify sanitarium triggers T234066 T233135 T234704
  • 09:40 effie: enable puppet on all hosts running hhvm - T229792
  • 09:36 XioNoX: restart fastnetmon on netflow2001
  • 09:27 effie: Disable puppet on all hosts running hhvm to merge 543131 - T229792
  • 09:22 effie: Disable puppet on mw* hosts to merge 543131
  • 09:20 gehel: force merging commonswiki_content on elasticsearch codfw
  • 08:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:15 _joe_: upgrading envoyproxy in production to 1.11.2 T235412
  • 05:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P9360 and previous config saved to /var/cache/conftool/dbconfig/20191016-052104-marostegui.json
  • 05:18 marostegui: Deploy schema change on s2 sanitarium master (db1074) this will create lag on s2 labsdb T233135 T234066
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P9359 and previous config saved to /var/cache/conftool/dbconfig/20191016-051812-marostegui.json
  • 05:14 marostegui: Change s7 triggers for archive table from db1125:3317 T234704
  • 05:11 marostegui: Change s2 triggers for archive table from db1125:3312 T234704
  • 05:08 marostegui: Deploy schema change on s7 sanitarium master (db1079) this will create lag on s7 labsdb T233135 T234066
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P9358 and previous config saved to /var/cache/conftool/dbconfig/20191016-050627-marostegui.json
  • 03:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465 (duration: 13m 37s)
  • 03:35 mobrovac@deploy1001: Started deploy [restbase/deploy@320f3a5]: Parsoid: Use the ETag for retrieving stashed content - T235465
  • 01:55 eileen: civicrm revision changed from 5a2f8048c4 to 4eac801762, config revision is dc3a88889d
  • 00:09 mutante: wikitech - make JBond a "content administrator" to give the ability to create server fingerprint pages

2019-10-15

  • 22:41 Reedy: manually running `extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php` T230245
  • 21:26 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Provide getCachableMWConfig() which doesn't rely on wgConf (duration: 01m 00s)
  • 21:24 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408) (duration: 05m 35s)
  • 21:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@cdfa545]: Media: Fix TypeError when processing pages with only Mathoid images (T235408)
  • 21:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings: Stop writing wmgScoreFileBackend and wmgScorePath, never read (duration: 00m 59s)
  • 21:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Stop using wmg variables for Score extension (duration: 01m 01s)
  • 21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write wgScoreFileBackend and wgScorePath directly, not via CommonSettings (duration: 01m 00s)
  • 20:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.2 refs T233850
  • {{safesubst:SAL entry|1=19:55 urandom: upgrade restbase2011-{a,b,c} to cassandra 3.11.-4 -- T200803}}
  • 19:52 urandom: upgrade restbase1016-c to cassandra 3.11.-4 -- T200803
  • 19:48 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.2 refs T233850 (duration: 27m 39s)
  • 19:48 urandom: upgrade restbase1016-b to cassandra 3.11.-4 -- T200803
  • 19:42 urandom: upgrade restbase1016-a to cassandra 3.11.-4 -- T200803
  • 19:20 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.2 refs T233850
  • 19:07 mutante: LDAP - adding user rzl to groups wmf and ops (T235215)
  • 17:51 longma: cutting the branch for 1.35.0-wmf.2 T233850
  • 16:28 ejegg: updated payments-wiki from c3cc3ace2f to 570324a30f
  • 16:24 papaul: power down lvs2010 for HW maintenance
  • 16:00 _joe_: uploading envoy 1.11.2 to stretch-wikimedia, buster-wikimedia T230779 T235412
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9355 and previous config saved to /var/cache/conftool/dbconfig/20191015-155454-marostegui.json
  • 15:52 papaul: power down lvs2009 for HW maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P9354 and previous config saved to /var/cache/conftool/dbconfig/20191015-154325-marostegui.json
  • 15:17 ejegg: updated payments-wiki from 8a65f57874 to c3cc3ace2f
  • 15:01 moritzm: installing fribidi bugfix updates from stretch point release
  • 14:54 moritzm: installing cups security updates for stretch (client-side libs/tools only)
  • 14:43 elukey: start a root tmux containing a bash script on conf1004 to clean up znodes under /yarn-rmstore/analytics-hadoop/ZKRMStateRoot/RMAppRoot slowly - T217057
  • 14:40 papaul: power down puppetmaster2002 for HW maintenance
  • 14:38 moritzm: installing usbutils update from stretch point release
  • 14:34 elukey: executed 'rmr' in zookeeper on conf1004 for znodes /yarn-leader-election /hadoop-ha /hive_zookeeper_namespace
  • 14:12 ejegg: updated fundraising python tools from b3c7453be2 to 31171f148c
  • 13:53 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9353 and previous config saved to /var/cache/conftool/dbconfig/20191015-130356-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9352 and previous config saved to /var/cache/conftool/dbconfig/20191015-124942-marostegui.json
  • 12:46 elukey: Hadoop maintenance over
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9351 and previous config saved to /var/cache/conftool/dbconfig/20191015-123356-marostegui.json
  • 12:24 mobrovac: restbase add parsoidphp tables in prod - T230792
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9350 and previous config saved to /var/cache/conftool/dbconfig/20191015-121840-marostegui.json
  • 12:17 marostegui: Repool labsdb1009 after PDU maintenance
  • 12:17 elukey: Hadoop maintenance start - migration to the new Zookepeer cluster
  • 12:16 moritzm: installing sudo security updates on buster/stretch
  • 12:13 arturo: add copy of python-pykube and python3-pykube from stretch-wikimedia to buster-wikimedia (T230961)
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 hashar: CI Jenkins restarted
  • 12:04 hashar: Restarting CI Jenkins
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9348 and previous config saved to /var/cache/conftool/dbconfig/20191015-120359-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P9347 and previous config saved to /var/cache/conftool/dbconfig/20191015-120133-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9346 and previous config saved to /var/cache/conftool/dbconfig/20191015-115922-marostegui.json
  • 11:12 Urbanecm: EU SWAT done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ac37540: Add `autopatrol` to translation administrators on mediawiki (duration: 00m 51s)
  • 11:12 jbond42: move puppetmaster_ca_server back to puppetmaster1001
  • 11:08 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip 195.113.145.2 (T235493)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT:855aca4eb: Throttle rule for Czech course (T235493) (duration: 00m 51s)
  • 10:54 moritzm: mark ruby-safe-yaml as manually installed using apt-mark on jessie/stretch, prevents accidental removal of ruby-safe-yaml after puppet 4->5 migration
  • 10:07 moritzm: installing openssl updates for buster (some ciphers we don't use were not enabled due to an upstream change related to the selection of ASM-optimised implementations over generic C)
  • 08:07 marostegui: Stop MySQL on db1126 and labsdb1009 for PDU maintenance - T226782
  • 08:06 elukey: upload new version of memkeys (adding a patch to merged to upstream to avoid segfaults on stretch/buster) to stretch|buster wikimedia apt repos - T223863
  • 07:52 Urbanecm: Set email for `Martin Urbanec (test 10)` to test@wikimedia.cz (debug, no ticket)
  • 07:48 Urbanecm: Password reset for Xaris333 #2 (T235441)
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for PDU maintenance T226782', diff saved to https://phabricator.wikimedia.org/P9345 and previous config saved to /var/cache/conftool/dbconfig/20191015-071338-marostegui.json
  • 07:10 XioNoX: failover VRRP from cr1-eqiad to cr2-eqiad in prevision of the PDU work of - T226782
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 T232446', diff saved to https://phabricator.wikimedia.org/P9344 and previous config saved to /var/cache/conftool/dbconfig/20191015-064419-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1070 T235464', diff saved to https://phabricator.wikimedia.org/P9343 and previous config saved to /var/cache/conftool/dbconfig/20191015-064005-marostegui.json
  • 05:38 marostegui: Depool labsdb1009 for PDU maintenance T226782
  • 05:28 marostegui: Deploy schema change on db1098:3317 T234066 T233135
  • 05:28 marostegui: Deploy schema change on db1097:3314 T233625
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P9342 and previous config saved to /var/cache/conftool/dbconfig/20191015-052621-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P9341 and previous config saved to /var/cache/conftool/dbconfig/20191015-052220-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P9340 and previous config saved to /var/cache/conftool/dbconfig/20191015-051924-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P9339 and previous config saved to /var/cache/conftool/dbconfig/20191015-051400-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P9338 and previous config saved to /var/cache/conftool/dbconfig/20191015-051236-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Promote db1100 to s5 master and remove read-only from s5 T234300', diff saved to https://phabricator.wikimedia.org/P9337 and previous config saved to /var/cache/conftool/dbconfig/20191015-050042-marostegui.json
  • 05:00 marostegui@cumin2001: dbctl commit (dc=all): 'Set s5 as read-only for maintenance T234300', diff saved to https://phabricator.wikimedia.org/P9336 and previous config saved to /var/cache/conftool/dbconfig/20191015-050016-marostegui.json
  • 05:00 marostegui: Starting s5 failover from db1070 to db1100 - T234300
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P9335 and previous config saved to /var/cache/conftool/dbconfig/20191015-043403-marostegui.json
  • 04:15 marostegui: Start pre-switchover steps T234300

2019-10-14

  • 23:27 Krinkle: Delete 2019-09-01โ€“โ€“2019-09-10 arclamp trace logs from webperf1002, and decompress the rest of 2019-09 (this will trigger svg re-generation), T235425
  • 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 86f12b6e (duration: 00m 51s)
  • 21:47 Krinkle: Deleting 2019-09-01โ€“โ€“2019-09-10 arclamp logs on webperf2002, and decompress the rest of 2019-09, T235425
  • 21:12 Krinkle: Delete misc arclamp/logs and arclamp/svgs data from between 2018 and and 2019-08 on webperf1002/webperf2002, T235425
  • 20:41 maxsem@deploy1001: Synchronized php-1.35.0-wmf.1/includes/: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/542963/ (duration: 00m 55s)
  • 17:56 mutante: webperf2002 - /srv/xenon/logs/daily# gzip 2019-09*excimer*.log (T235425)
  • 17:21 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates (duration: 16m 45s)
  • 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@217cac5]: New blazegraph build and GUI updates
  • 16:07 moritzm: imported cergen 0.2.4-1+deb10u3 to component/cergen for buster-wikimedia T235405
  • 16:00 Urbanecm: Password reset for Xaris333 (T235441)
  • 15:57 moritzm: imported cergen 0.2.4-1+deb10u2 to component/cergen for buster-wikimedia T235405
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9329 and previous config saved to /var/cache/conftool/dbconfig/20191014-142843-marostegui.json
  • 14:28 elukey: upload matomo 3.11 to stretch-wikimedia and upgrade matomo1001 - T234607
  • 14:21 marostegui: Deploy schema change on db1116:3317 T234066 T233135
  • 14:13 effie: Enable puppet on mw* servers and reload apache - T229792
  • 13:48 moritzm: imported cergen 0.2.4-1+deb10u1 to component/cergen for buster-wikimedia T235405
  • 13:42 marostegui: Repool labsdb1009 after PSU replacement - T233273
  • 13:36 effie: Slowly enable puppet on mw* canaries
  • 13:26 moritzm: imported python-networkx 1.11-2~wmf1 to component/cergen for buster-wikimedia T235405
  • 13:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:19 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:18 effie: Disable puppet on mw* to remove php72_only feature flag - T229792
  • 13:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 245b4e5: Add banwiki logo to IS.php (T234768) (duration: 00m 51s)
  • 13:12 Urbanecm: Run git reset --hard origin/master in /srv/mediawiki-stagging (deleted https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542920 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/542919 from deployment srv, both don't actually change anything => safe to delete) (T234768)
  • 13:10 marostegui: Sanitize banwiki on db1124:3313 and db2094:3313 T234770
  • 12:44 Amir1: Creating banwiki is banned (done)
  • 12:40 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
  • 12:34 ladsgroup@deploy1001: Synchronized langlist: Creating banwiki: T234768 (duration: 00m 50s)
  • 12:32 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating banwiki: T234768 (duration: 00m 51s)
  • 12:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating banwiki: T234768
  • 12:20 ladsgroup@deploy1001: Synchronized dblists: Creating banwiki: T234768 (duration: 00m 52s)
  • 12:10 tarrow@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/Wikibase: SWAT: Bump up Termbox cache version (T235192) (duration: 00m 56s)
  • 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reftabs on testwikidata (T199197, T228412) (duration: 00m 51s)
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a295cc7: Fix wrong domain in wgCopyUploadDomains added in T203363 (T235415) (duration: 00m 51s)
  • 11:27 kart_: Update cxserver to 2019-10-03-054958-production (T232986)
  • 11:22 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:17 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 538867|Use ContentTranslationEnableMT to disable MT (T232986) (duration: 00m 51s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9326 and previous config saved to /var/cache/conftool/dbconfig/20191014-100758-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1130 into s5 api, db1100 will be removed later in preparation for tomorrow's failover T234300', diff saved to https://phabricator.wikimedia.org/P9325 and previous config saved to /var/cache/conftool/dbconfig/20191014-094809-marostegui.json
  • 09:34 hashar: Upgraded CI jobs to Quibble 0.0.38
  • 09:14 marostegui: Deploy schema change on dbstore1003:3317
  • 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:55 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:52 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 and db2126 after changing sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9322 and previous config saved to /var/cache/conftool/dbconfig/20191014-085143-marostegui.json
  • 08:46 mobrovac: restbase drop metadata keyspaces from cassandra - T235173
  • 07:54 marostegui: Stop db1074 and db2126 in sync to change sanitarium's master for s2 - T231638
  • 07:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata (duration: 03m 58s)
  • 07:45 mobrovac@deploy1001: Started deploy [restbase/deploy@4d469a1] (dev-cluster): Remove VE logging and stop using storage for /page/metadata
  • 07:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173 (duration: 13m 37s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db2126 to change sanitarium to replicate from db1074 T231638', diff saved to https://phabricator.wikimedia.org/P9320 and previous config saved to /var/cache/conftool/dbconfig/20191014-073319-marostegui.json
  • 07:28 mobrovac@deploy1001: Started deploy [restbase/deploy@e0d071f]: Remove VE logging and stop using storage for /page/metadata - T234928 T235173
  • 07:28 mobrovac@deploy1001: Finished deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173 (duration: 01m 25s)
  • 07:26 mobrovac@deploy1001: Started deploy [changeprop/deploy@c25a1c2]: Do not pre-generate /page/metadata - T235173
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json
  • 07:16 marostegui: Stop MySQL on labsdb1009 for on-site maintenance - T233273
  • 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)
  • 06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)
  • 05:47 marostegui: Remove db2068 from tendril and zarcillo T235399
  • 04:56 marostegui: Depool labsdb1009 for on-site maintenance - T233273
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9318 and previous config saved to /var/cache/conftool/dbconfig/20191014-045629-marostegui.json

2019-10-13

  • 00:52 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ec77b1b (duration: 00m 55s)

2019-10-12

  • 23:21 krinkle@deploy1001: Synchronized wmf-config/profiler.php: bfa8bb69c1f, T231564 (duration: 00m 51s)
  • 21:07 krinkle@deploy1001: Synchronized php-1.35.0-wmf.1/includes/resourceloader/ResourceLoaderStartUpModule.php: 8c6baeae2 (duration: 00m 53s)
  • 20:57 Urbanecm: Reset user email of User:Gardini (T235318)
  • 18:38 _joe_: deleting zotero pods with excessive memory usage in eqiad
  • 16:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: T235334 (duration: 00m 51s)
  • 16:15 reedy@deploy1001: Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBacklinksprop.php: T235334 (duration: 00m 56s)
  • 04:37 krinkle@deploy1001: Synchronized wmf-config/profiler.php: 29d8469 (duration: 00m 57s)

2019-10-11

  • 15:39 AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de
  • 13:57 moritzm: rebooting cloudbackup2001
  • 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - T235150
  • 12:47 XioNoX: disable SIP ALG on pfw3-codfw - T235150
  • 12:45 moritzm: installing libxslt security updates
  • 12:35 moritzm: installin zsh updates from stretch point release
  • 12:33 moritzm: installing gsoap security updates on stretch
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
  • 12:31 moritzm: installing libcaca security updates on stretch
  • 12:25 XioNoX: push firewall policies to pfw3-eqiad - T235074
  • 12:24 XioNoX: push firewall policies to pfw3-codfw - T235074
  • 11:51 moritzm: installing unzip security updates on stretch
  • 11:08 moritzm: upgrading debdeploy to 0.0.99.11
  • 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
  • 10:11 hashar: Restarting Gerrit # T224448
  • 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
  • 08:34 moritzm: remove kafka2001-2003 from debmonitor DB (T235125)
  • 08:32 moritzm: remove kafka1001-1003 from debmonitor DB (T235125)
  • 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
  • 07:32 XioNoX: rollback two previous HE peering deactivate
  • 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
  • 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
  • 06:13 marostegui: Compress tables on db2085:3318 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
  • 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
  • 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
  • 02:14 mutante: gerrit - "manually" starting replication via ssh command
  • 02:13 mutante: gerrit - restart service to ensure last config change is picked up
  • 02:10 mutante: gerrit1001 - attempt to manually start replication to github

2019-10-10

  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Update cron-updated miser pages to say they are run periodically, not never (duration: 00m 51s)
  • 22:10 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Remove debug line dating from 2015-12-08! (duration: 00m 51s)
  • 22:04 jforrester@deploy1001: Synchronized wmf-config/mc.php: Drop nutcracker indirection for HHVM servers, just point to localhost (duration: 00m 51s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Drop special-case for PHP7, now always used (duration: 00m 51s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop HHVM special-case for SVG converter, no longer used (duration: 00m 51s)
  • 21:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't check to shard static config cache for HHVM any more (duration: 00m 50s)
  • 21:48 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Don't check to shard wmgWBSharedCacheKey for HHVM any more (duration: 00m 51s)
  • 21:39 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/lib/ve/src/dm/ve.dm.TreeCursor.js: T234881 TreeCursor: cross ignored nodes properly from the end of a text node (duration: 00m 54s)
  • 20:36 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004 (duration: 00m 06s)
  • 20:36 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004
  • 20:13 hoo: Updated the Wikidata property suggester with data from the 2019-09-30 JSON dump and applied the T132839 workarounds
  • 19:33 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 19:29 marxarelli: promoted 1.35.0-wmf.1 to all wikis. no rise in errors rates. no new relevant errors cc: T233849
  • 19:25 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.1
  • 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1
  • 19:09 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s)
  • 19:04 marxarelli: promoting labswiki to 1.35.0-wmf.1 cc: T233849
  • 17:07 jbond42: puppetmaster1001 has been upgraded and is back serving requests
  • 16:21 urandom: Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803
  • 16:18 urandom: Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:16 urandom: Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:11 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:07 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:04 thcipriani: restarting gerrit due to T224448
  • 16:04 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:01 urandom: Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 15:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s)
  • 15:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json
  • 14:54 moritzm: ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage)
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json
  • 14:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 14:03 jbond42: re-enable puppet now ca has been correctly moved
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json
  • 13:50 jbond42: disable puppet fleet wide as puppetmaster2002 is stuggeling
  • 13:32 jbond42: reimage puppetmaster1001
  • 13:27 marostegui: Repool labsdb1011 after reclone - T235016
  • 13:16 arturo: added flannel 0.5.5-4 to buster-wikimedia (T235059)
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s)
  • 13:00 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 12:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 11:57 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:57 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:48 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:46 jbond@cumin2001: Updating IPMI password on 35 hosts - jbond@cumin2001
  • 11:46 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Fix typo in beta repo data bridge config (T235033) (duration: 00m 59s)
  • 11:40 marostegui: Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135
  • 11:38 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:38 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:38 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:37 arturo: icinga downtime cloudvirt1023 for 2h (T227536)
  • 11:36 arturo: icinga downtime cloudvirt1025 for 2h (T227536)
  • 11:36 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:36 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:36 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:35 arturo: icinga downtime cloudvirt1026 for 2h (T227536)
  • 11:35 marostegui: Stop replication on db2077 to change triggers on db2095:3317 - T234704
  • 11:23 moritzm: installing reportbug updates from stretch point release
  • 11:22 Lucas_WMDE: EU SWAT done
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Set dataBridgeEnabled repo setting on beta (T235033) (affects InitialiseSettings-labs.php and Wikibase.php, but Wikibase.php part is guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:14 Lucas_WMDE: ^ (and by CS, I actually mean Wikibase.php, not CommonSettings.php, sorry)
  • 11:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Rename data bridge config variable names (T235033) (affects IS-labs and CS, but the CS part is all guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 10:38 moritzm: rebalancing Ganeti eqiad/row C after rolling reboots of Ganeti nodes
  • 10:34 volans: uploaded spicerack_0.0.28-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 08:23 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:12 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP - T233654 (duration: 01m 01s)
  • 07:55 marostegui: Stop MySQL on es1014 es1013 db1084 db1083 db1077 db1076 db1112 db1124 db1118 for on-site PDU maintenance (this will generate lag on labsdb hosts) - T227536
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:56 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Drop designate_pool_manager database from m5 - T233978
  • 06:33 marostegui: Revoke privileges from designate user on the designate_pool_manager database - T233978
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for PDU maintenance T227536', diff saved to https://phabricator.wikimedia.org/P9294 and previous config saved to /var/cache/conftool/dbconfig/20191010-055153-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1078 into rc service for s3 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9293 and previous config saved to /var/cache/conftool/dbconfig/20191010-055102-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 db1083 db1076 db1118 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9292 and previous config saved to /var/cache/conftool/dbconfig/20191010-054853-marostegui.json
  • 05:47 marostegui: Depool db1084 db1083 db1076 db1118 for PDU maintenance - T227536
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 marostegui: Deploy schema change on db1061 (s6 eqiad master) - T233135 T234066
  • 04:43 marostegui: Depool labsdb1011 for recloning - T235016
  • 00:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 00:39 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 00:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 00:38 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset

2019-10-09

  • 23:55 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 03m 57s)
  • 23:51 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: (no justification provided)
  • 23:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable AMC on all wikis (T233612) (duration: 00m 58s)
  • 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Turn on AMC outreach modal (T234026) (duration: 00m 59s)
  • 22:01 mutante: restarting gerrit to revert replication config change (T235135)
  • 21:27 godog: swift eqiad-prod: add ms-be105[1-6] - T232367
  • 21:02 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: (no justification provided) (duration: 00m 02s)
  • 21:02 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 21:02 otto@deploy1001: deploy aborted: (no justification provided) (duration: 38m 29s)
  • 20:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006 (duration: 01m 44s)
  • 20:53 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006
  • 20:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds (duration: 02m 42s)
  • 20:41 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds
  • 20:31 papaul: rebooting ms-be1051 to access BIOS
  • 20:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e (duration: 06m 22s)
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 20:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 00m 10s)
  • 20:16 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 05m 34s)
  • 20:10 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:09 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 02m 23s)
  • 20:06 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:56 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 00m 12s)
  • 19:54 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:54 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:52 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 08m 00s)
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:44 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 09m 33s)
  • 19:34 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:25 marxarelli: 1.35.0-wmf.1 promoted to group1, labswiki rolled back to 1.34.0-wmf.25 and to be kept back, cc: T233849
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki rollback to 1.34.0-wmf.25 due to hhvm
  • {{safesubst:SAL entry|1=19:09 urandom: Upgrade restbase-dev1006-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 19:09 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.1 (duration: 00m 58s)
  • 19:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.1
  • {{safesubst:SAL entry|1=18:51 urandom: Upgrade restbase-dev1005-{a,b} to Cassandra 3.11.4 -- T200803}}
  • {{safesubst:SAL entry|1=18:45 urandom: Upgrade restbase-dev1004-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 18:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:43 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid config changes
  • 17:19 eileen: civicrm revision changed from 2ba100486e to 5a2f8048c4, config revision is 5560cc0878
  • 16:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:48 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9289 and previous config saved to /var/cache/conftool/dbconfig/20191009-160506-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9288 and previous config saved to /var/cache/conftool/dbconfig/20191009-153705-marostegui.json
  • 15:04 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:02 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1085 vslow and dump group', diff saved to https://phabricator.wikimedia.org/P9287 and previous config saved to /var/cache/conftool/dbconfig/20191009-145102-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9286 and previous config saved to /var/cache/conftool/dbconfig/20191009-144928-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9285 and previous config saved to /var/cache/conftool/dbconfig/20191009-144607-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'More trafic to db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9284 and previous config saved to /var/cache/conftool/dbconfig/20191009-144400-marostegui.json
  • 14:38 elukey: cr1-eqsin: change IPv6 address for BGP peer AS4761
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9283 and previous config saved to /var/cache/conftool/dbconfig/20191009-141137-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9282 and previous config saved to /var/cache/conftool/dbconfig/20191009-140749-marostegui.json
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: rebalancing Ganeti eqiad/row A after rolling reboots of Ganeti nodes
  • 13:48 jbond42: reimage puppetmaster2001
  • 13:37 vgutierrez: repooling cp1085 - T231525
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1075', diff saved to https://phabricator.wikimedia.org/P9280 and previous config saved to /var/cache/conftool/dbconfig/20191009-133709-marostegui.json
  • 13:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928 (duration: 14m 26s)
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9279 and previous config saved to /var/cache/conftool/dbconfig/20191009-125641-marostegui.json
  • 12:42 marostegui: Stop MySQL and power off db1074 for BBU replacement T231638
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9278 and previous config saved to /var/cache/conftool/dbconfig/20191009-124218-marostegui.json
  • 12:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2 (duration: 08m 18s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9277 and previous config saved to /var/cache/conftool/dbconfig/20191009-124035-marostegui.json
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 moritzm: disabled puppet on DNS recursors for staged rollout of ferm NTP change
  • 12:35 jbond42: reimage puppetmaster2002
  • 12:32 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2
  • 12:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928 (duration: 09m 40s)
  • 12:28 vgutierrez: depooling cp1085 for a power drain - T231525
  • 12:20 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928
  • 12:13 moritzm: draining ganeti1001 for upcoming reboot (combined kernel/qemu security updates)
  • 12:10 moritzm: failover Ganeti master in eqiad to ganeti1003
  • 12:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:32 moritzm: draining ganeti1008 for upcoming reboot (combined kernel/qemu security updates)
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 Amir1: EU SWAT is done
  • 11:04 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put write both limit down to Q70m for item terms (T234948) (duration: 01m 10s)
  • 11:04 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:58 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:18 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:44 moritzm: draining ganeti1007 for upcoming reboot (combined kernel/qemu security updates)
  • 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:59 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change, temporarily pool db1085 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9276 and previous config saved to /var/cache/conftool/dbconfig/20191009-085016-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P9275 and previous config saved to /var/cache/conftool/dbconfig/20191009-084732-marostegui.json
  • 08:39 vgutierrez: Switch cp1082 from nginx to ats-tls - T231433
  • 08:24 moritzm: draining ganeti1006 for upcoming reboot (combined kernel/qemu security updates)
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: Switch cp2011 from nginx to ats-tls - T231433
  • 07:48 moritzm: reduced RAM assignment for boron to 8G
  • 07:38 vgutierrez: Switch cp3038 from nginx to ats-tls - T231433
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:34 vgutierrez: switching from nginx to ats-tls on cp4024 - T231433
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013, es1014 T227536 (duration: 01m 00s)
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change - lag will be generated on s6 labs', diff saved to https://phabricator.wikimedia.org/P9274 and previous config saved to /var/cache/conftool/dbconfig/20191009-051911-marostegui.json
  • 05:11 marostegui: Restart gerrit as it is down
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P9273 and previous config saved to /var/cache/conftool/dbconfig/20191009-045941-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312', diff saved to https://phabricator.wikimedia.org/P9272 and previous config saved to /var/cache/conftool/dbconfig/20191009-044752-marostegui.json
  • 04:40 vgutierrez: switching cp5004 from nginx to ats-tls - T231433

2019-10-08

  • 23:28 mutante: phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
  • 23:05 ebernhardson@deploy1001: Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
  • 21:28 XenoRyet: updated payments-wiki from d2e2637275 to 8a65f57874
  • 21:09 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 20:38 mutante: labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
  • 20:24 mutante: labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
  • 19:51 marxarelli: 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
  • 19:43 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
  • 19:38 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
  • 19:29 shdubsh: adding swagger exporter to apt repo
  • 19:13 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
  • 18:54 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
  • 18:53 godog: codfw-prod: more weight to ms-be205[1-6] - T233638
  • 18:45 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
  • 17:32 marxarelli: cutting wmf/1.35.0-wmf.1
  • 16:17 cstone: civicrm revision changed from db7ef10bfa to 2ba100486e
  • 16:00 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:30 XioNoX: remove 2 more sessions to AS12871 on cr2-esams - T232617
  • 15:20 XioNoX: add BGP sessions to AS199524 on cr2-eqdfw
  • 15:18 XioNoX: add BGP sessions to AS2635 on cr2-eqiad
  • 15:13 XioNoX: renumber BGP session to AS4761 on cr1-eqsin
  • 13:53 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:51 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
  • 13:50 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
  • 13:49 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 marostegui@cumin2001: dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
  • 13:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
  • 13:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
  • 13:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
  • 12:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
  • 12:27 marostegui: Stop MySQL on es1012 for onsite maintenance
  • 12:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:10 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:58 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:57 jbond42: testing ipmi reset cookbook. using the current pass for both old and new so no reset actully occures
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:57 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:22 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:21 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 moritzm: draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
  • 10:16 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
  • 10:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:09 mobrovac@deploy1001: Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
  • 09:20 marostegui: Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
  • 09:09 moritzm: draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 mobrovac@deploy1001: Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
  • 08:38 mobrovac@deploy1001: Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
  • 08:33 elukey: roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
  • 08:10 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:10 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:09 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 07:51 moritzm: draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:49 akosiaris: update OTRS to 5.0.38
  • 07:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
  • 07:10 moritzm: draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
  • 06:48 marostegui: Stop MySQL on es1011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
  • 06:09 marostegui: Repool labsdb1011 after mysql upgrade
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:44 elukey: drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
  • 05:35 elukey: drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
  • 05:25 marostegui: Depool labsdb1011 for mysql upgrade
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
  • 05:10 marostegui: Reload query killer on labsdb1011
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
  • 05:07 marostegui: Deploy schema change on db1097:3315 - T233625
  • 03:04 andrewbogott: restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 โ€” experimental band-aid for T234876
  • 00:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)

2019-10-07

  • 23:52 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:26 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 00m 49s)
  • 23:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:21 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)
  • 22:29 chaomodus: restart nagios-nrpe-server on stat1007
  • 21:56 mutante: gerrit2001 - sudo rm /etc/apache2/sites-available/50-gerrit-slave-wikimedia-org.conf
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Run Labs config after CSP config so it can change it (duration: 00m 51s)
  • 21:20 godog: swift codfw-prod: add ms-be205[3456] - T233638
  • 20:56 XenoRyet: updated payments-wiki from b94da68f7e to d2e2637275
  • 20:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:33 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:29 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add the beta REL1_34 to ExtensionDistributor (duration: 00m 50s)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 Lucas_WMDE: Morning SWAT done
  • 19:09 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/Wikibase: SWAT: Revert "Format coordinates with limited precision" (T174504) (duration: 00m 57s)
  • 18:33 Lucas_WMDE: reopen Morning SWAT for another backport (sorry)
  • 18:26 Urbanecm: Morning SWAT done
  • 18:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: 011b6eb: 11033b7: Update VE core submodule to 2ffb699eb (TreeModifier fixes), T234489, T234742 + ve.ui.MWDefinedTransclusionContextItem: Fix handling of template names (T234817) (duration: 00m 53s)
  • 18:16 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/539978
  • 18:12 andrewbogott: apt dist-upgrade on all cloudvirts (for nova upgrades)
  • 18:12 godog: start swiftrepl eqiad -> codfw (no deletes)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f434ae3: Enable NewUserMessage on sq.wikipedia and sq.wikiquote (T234499) (duration: 00m 52s)
  • 18:07 jgleeson: Updating civicrm from c12f7bb51f to db7ef10bfa
  • 17:46 ottomata: stat1007 is unresponsive, can't login via mgmt either. powercycling.
  • 17:29 XioNoX: add BGP route damping on IX sessions - eqiad - T222424
  • 17:27 XioNoX: add BGP route damping on IX sessions - esams - T222424
  • 17:22 XioNoX: add BGP route damping on IX sessions - eqsin - T222424
  • 15:34 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae (duration: 06m 28s)
  • 15:30 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:27 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae
  • 15:27 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop writing wmgVisualEditorEnableNewMobileContext (duration: 00m 51s)
  • 15:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgVisualEditorEnableNewMobileContext (duration: 00m 52s)
  • 14:25 arturo: upgrading openstack in CloudVPS. Some IRC bots and related stuff may be unavailable (T212302)
  • 14:17 marostegui: Deploy schema change on db1139:3316 - T233135 T234066
  • 13:27 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata to write both for item term store (T225055) (duration: 00m 54s)
  • 13:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2 (duration: 06m 38s)
  • 13:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9248 and previous config saved to /var/cache/conftool/dbconfig/20191007-131720-marostegui.json
  • 13:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging (duration: 07m 01s)
  • 13:13 elukey: upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941
  • 13:09 mobrovac@deploy1001: Started deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging
  • 13:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: (no justification provided) (duration: 00m 29s)
  • 13:04 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: (no justification provided)
  • 13:04 mobrovac@deploy1001: deploy aborted: Minor tweaks to VE logging (duration: 01m 07s)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9247 and previous config saved to /var/cache/conftool/dbconfig/20191007-130317-marostegui.json
  • 13:03 mobrovac@deploy1001: Started deploy [restbase/deploy@fe39197]: Minor tweaks to VE logging
  • 12:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restrouter
  • 12:54 elukey: upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941
  • 11:44 Lucas_WMDE: EU SWAT done
  • 11:44 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of main page hack for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:42 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:41 Amir1: another hack bites the dust
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/GrowthExperiments/: SWAT: Homepage: Don't use flexbox for vertical layouts in mobile start module (T234380) (duration: 00m 53s)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on nlwiki (T234685) (duration: 00m 52s)
  • 11:16 arturo: added bdsync 0.11.1-1~wmf1 to buster-wikimedia (T234683)
  • 10:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5 (duration: 04m 17s)
  • 10:55 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5
  • 10:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4 (duration: 04m 27s)
  • 10:50 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4
  • 10:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3 (duration: 03m 53s)
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:31 _joe_: uploading confd 0.16.0 to stretch
  • 10:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2 (duration: 01m 56s)
  • 10:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2
  • 10:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)
  • 10:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772
  • 09:55 marostegui: Deploy schema change on db2129 (s6 codfw master), this will generate lag on s6 codfw - T233135 T234066
  • 08:34 hashar: gerrit: force reindexing all changes ( gerrit index start changes --force )
  • 07:09 marostegui: Remove grants for dbproxy1006 on m1 databases - T231280
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9246 and previous config saved to /var/cache/conftool/dbconfig/20191007-065645-marostegui.json
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1011 T227138 (duration: 01m 10s)
  • 06:08 elukey: upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:25 marostegui: Deploy schema change on db2124 T233135 T234066
  • 05:10 marostegui: The above was for db2095:3316 T234704
  • 05:08 marostegui: Stop replication on db2076 to modify triggers on db2096:3316 T234704
  • 05:02 marostegui: Fix replication on labsdb1011:s8
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9245 and previous config saved to /var/cache/conftool/dbconfig/20191007-045411-marostegui.json

2019-10-06

  • 20:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Racconish /home/urbanecm/T234741 (T234741)
  • 19:15 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy1018, dbproxy1019
  • 06:47 elukey: delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam

2019-10-05

  • 06:48 elukey: force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory

2019-10-04

  • 22:06 mutante: ms-be1020 - power cycle via mgmt - host down
  • 20:43 krinkle@deploy1001: Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)
  • 20:41 mutante: deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)
  • 20:32 mutante: gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)
  • 19:27 mutante: wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts
  • 14:03 marostegui: Deploy schema change on db2117 T233135 T234066
  • 13:50 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:36 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:28 marostegui: Deploy schema change on db2097:3316 T233135 T234066
  • 12:23 elukey: cleaned up old files and apt-cache from an-coord1001
  • 08:41 marostegui: Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066
  • 08:32 _joe_: reuploading the old confd package to stetch-wikimedia, some incompatibility detected
  • 07:26 elukey: execute gnt-instance remove kerberos1001 on ganeti1001 - T234600
  • 07:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Deploy schema change on db2114 T233135 T234066
  • 06:22 _joe_: downgrading confd back to 0.9.0 while some templates get fixed.
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui: Deploy schema change on dbstore1005:3316 T233135 T234066
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:53 _joe_: upgrading confd on puppetmaster1001 T147204
  • 05:50 _joe_: uploading confd 0.16.0 on stretch T147204
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)

2019-10-03

  • 23:50 mutante: gerrit - restarting for replication config tweaks
  • 20:05 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 19:52 XenoRyet: updated payments-wiki from 80dead6444 to b94da68f7e
  • 19:40 mutante: mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153
  • 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 19:30 marxarelli: 1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors
  • 19:21 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25
  • 19:19 mutante: puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)
  • 19:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:52 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)
  • 18:43 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c2b3d7c (duration: 00m 59s)
  • 18:14 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)
  • 17:13 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)
  • 17:07 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7
  • 13:49 elukey: roll restart hadoop yarn resource managers for openssl updates on Hadoop workers
  • 13:44 marostegui: Stop MySQL and shutdown es1019 for on-site maintenance - T233698
  • 13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)
  • 13:29 hashar: Gerrit should be back
  • 13:26 hashar: restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl
  • 13:22 hashar: Gerrit might be dead again; taking traces
  • 13:04 _joe_: restarting php7 on mw1275
  • 12:54 onimisionipe: force shard allocation on eqiad chi cluster
  • 10:27 elukey: killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs
  • 10:25 jbond42: rolling upgrade of openssl packages
  • 10:21 Urbanecm: Manually cleared signup throttle for IP 80.188.128.54 at cswiki, issue with introduced throttle rule
  • 10:20 Urbanecm: Manually cleared signup throttle for IP 88.100.221.84 at cswiki, issue with introduced throttle rule
  • 10:18 Urbanecm: Manually cleared signup throttle for IP 90.176.155.12 at cswiki, issue with introduced throttle rule
  • 09:32 elukey: run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop)
  • 08:33 marostegui: Deploy schema change on db2087:3316 T233135 T234066
  • 08:28 marostegui: Deploy schema change on db1096:3316 - T233625
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9236 and previous config saved to /var/cache/conftool/dbconfig/20191003-082651-marostegui.json
  • 08:15 akosiaris: slowly rolling restart all pods in eqiad, codfw, staging for log rollover before merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539912
  • 07:49 marostegui: Set notes on the sanitarium masters - T234039
  • 07:19 marostegui: Remove unused labspuppet database from m5 - T233281
  • 07:03 @: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 07:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 06:59 eileen: tools revision changed from e1b81688c6 to b3c7453be2
  • 06:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 06:48 marostegui: Drop database grants on m5 for labspuppet - T233281
  • 06:37 marostegui: Rename tables on m5 master on designate_pool_manager - T233978
  • 06:16 marostegui: Deploy schema change on db2089:3316 T233135 T234066
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 eileen: civicrm revision changed from 12c5727a23 to c12f7bb51f, config revision is 422a0f7d48
  • 02:07 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1c599baea51f9 (duration: 01m 03s)
  • 01:05 mutante: gerrit1001 - shutdown - scheduled downtime
  • 00:51 mutante: gerrit1001 - removing wrong IPv6 address from interface, running puppet

2019-10-02

  • 23:42 XioNoX: enable cr2-eqiad:xe-4/0/0 - T234416
  • 23:38 XioNoX: disable cr2-eqiad:xe-4/0/0 - T234416
  • 23:22 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 00s)
  • 23:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 02s)
  • 22:29 godog: remove queued messages from mx1001 for fr-tech-ops@, triggering sender rate limit from gmail
  • 22:12 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:11 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 00m 59s)
  • 22:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 01m 00s)
  • 21:17 mutante: cobalt (gerrit) rsyncing /srv/gerrit/git and /srv/gerrit/plugins data to gerrit1001 again after reinstall and fixing gerrit2 UID/GID (T222391)
  • 21:13 mutante: gerrit1001 - rebooting
  • 21:08 mutante: gerrit1001 changing GID of gerrit2 user to 119 in /etc/group ; find / -uid 499 -exec chown gerrit2 {} \; find / -gid 1001 -exec chown gerrit2:gerrit2 {} \; (T222391)
  • 21:03 mutante: gerrit1001 changing UID of gerrit2 user to 114 and GID to 119 in /etc/passwd to match cobalt to avoid privilege issues after rsyncing data (T222391)
  • 19:58 mutante: puppetmaster1001 - sudo puppet cert clean parsoid.discovery.wmnet (only created yesterday but does not have all the SANs it needs, updating with more SANs) (T233654)
  • 19:47 Jeff_Green: deployed icinga fundraising-nsca collection configuration change
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 marxarelli: 1.34.0-wmf.25 promoted to group1, cc: T220750. no rise in relevant error rates
  • 19:23 dduvall@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.25 (duration: 00m 59s)
  • 19:22 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.25
  • 18:28 XioNoX: add BGP route damping on IX sessions - eqord - T222424
  • 18:25 XioNoX: add BGP route damping on IX sessions - eqdfw - T222424
  • 18:15 XioNoX: add BGP route damping on IX sessions - ulsfo - T222424
  • 17:08 Lucas_WMDE: Morning SWAT done
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: SWAT: vector.js: Remove eager calculation of p-cactions width on page load (duration: 01m 00s)
  • 16:53 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: Enabling revision-score stream in eventstreams
  • 16:50 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:50 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: (no justification provided) (duration: 00m 01s)
  • 16:50 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: ApiVisualEditor: Add logging for RESTBase HTTP errors (T233127) + ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 04s)
  • 16:42 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/: SWAT: ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 03s)
  • 15:31 godog: correction, add ms-be2052
  • 15:29 godog: swift codfw-prod: add ms-be2051 T233638
  • 15:13 godog: run swiftrepl eqiad -> codfw on ms-fe1005 (no deletes)
  • 14:31 moritzm: installing libxslt security updates on stretch
  • 14:16 moritzm: installing babeltrace bugfix update from buster point release
  • 13:18 moritzm: installing mariabd-10.3 update from buster point release (just client side libs, tools)
  • 13:16 moritzm: installing console-setup bugfix update from buster point release
  • 11:28 moritzm: installing cryptsetup bugfix from buster 10.1 point release
  • 11:26 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 01711d5: Enable partial blocks at ptwiki (T233754) (duration: 00m 55s)
  • 11:26 jbond42: update puppet.eqiad.wmnet to puppetmaster2001
  • 11:24 jbond42: update puppet.esams.wmnet to puppetmaster2001
  • 11:20 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set new MFMobileFormatterOptions config using old config (T232690) (duration: 01m 01s)
  • 11:15 _joe_: testing the package on restbase-dev1006
  • 11:14 _joe_: uploaded service-checker 0.2.0 to stretch-wikimedia
  • 11:12 pmiazga@deploy1001: Synchronized wmf-config/mobile.php: SWAT: Do not set wgMFNoindexPages config flag in mobile.php (T206497) (duration: 01m 14s)
  • 10:17 gehel@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:17 gehel@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:41 moritzm: rebalancing Ganeti/codfw Row A after rolling reboot of Ganeti nodes
  • 07:46 moritzm: upgrading remaining stretch hosts to ferm 2.4.2pre
  • 06:23 marostegui: Fix replication on labsdb1011:s7 - T233986
  • 06:17 marostegui: Fix replication on labsdb1011:s1 - T233986
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:07 vgutierrez: restarting trafficserver-tls on cp5007
  • 00:54 ejegg: updated fundraising CiviCRM from 6d90d0cf06 to 12c5727a23
  • 00:34 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/resources/src: 5eb3ae1 (duration: 01m 00s)
  • 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: d30064229f9 (duration: 00m 59s)

2019-10-01

  • 23:46 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: T233127: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)
  • 23:44 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:28 mutante: cobalt (gerrit) rsyncing /srv/gerrit/plugins dir, push to new server gerrit1001 (T222391)
  • 23:21 mutante: gerrit1001 - chown -R gerrit2:gerrit2 /srv/gerrit/git/ (T222391)
  • 23:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233211: CirrusSearch: Configuration for glent m0 AB test (duration: 00m 58s)
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233127: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)
  • 22:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 22:19 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 21:34 godog: swift codfw-prod: add ms-be2051 with minimal weight - T233638 T222366
  • 21:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: bb2fd9cf9c22cc (duration: 01m 00s)
  • 21:29 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 21:29 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 20:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 20:10 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:58 mutante: cobalt (gerrit) - rsyncing gerrit data to gerrit1001 in a screen session (T222391)
  • 19:47 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 19:47 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:42 marxarelli: 1.34.0-wmf.25 promoted to group0 cc: T220750. no rise in relevant error rates
  • 19:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.25
  • 19:30 marxarelli: promoting 1.34.0-wmf.25 to group0
  • 19:28 dduvall@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache (duration: 19m 31s)
  • 19:08 dduvall@deploy1001: Started scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache
  • 19:07 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.23 (duration: 01m 32s)
  • 19:04 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.22 (duration: 01m 41s)
  • 19:02 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.21 (duration: 01m 57s)
  • 19:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 19:00 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 18:59 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.20 (duration: 02m 11s)
  • 18:57 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.19 (duration: 02m 12s)
  • 18:54 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.17 (duration: 02m 48s)
  • 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 (duration: 18m 45s)
  • 17:53 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:52 thcipriani: gerrit restart for new config changes incoming
  • 17:52 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:48 XioNoX: rotate PDUs passwords - T233053
  • 17:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)
  • 17:07 mutante: Welcome new deployer Andrew Kostka (WMDE) (T233202)
  • 17:07 marxarelli: cutting wmf/1.34.0-wmf.25
  • 16:16 _joe_: manually downgrading php-geoip on deploy*, it was still at the 7.0-only version from the distro
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:10 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 15:36 _joe_: uninstalling temporarily the math rendering related packages from mwdebug2002, test for T195847
  • 15:36 elukey: powercycle an-conf1001 to test some bios settings
  • 15:12 jbond42: puppetmaster2001 is back online
  • 14:34 dcausse: created cirrussearch indices for nqowiki (T234326)
  • 14:18 moritzm: rebooting krb1001 for some tests
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:10 hashar: Restarting CI Jenkins
  • 14:08 cdanis: โœ”๏ธ cdanis@puppetmaster2001.codfw.wmnet ~ ๐Ÿ•™โ˜• (cd /var/lib/git/labs/private ; git rev-parse HEAD | sudo tee /srv/config-master/labsprivate-sha1.txt )
  • 14:08 cdanis: โœ”๏ธ cdanis@puppetmaster2001.codfw.wmnet ~ ๐Ÿ•™โ˜• (cd /var/lib/git/operations/puppet ; git rev-parse HEAD | sudo tee /srv/config-master/puppet-sha1.txt )
  • 14:08 herron: beginning rolling reboots of eqiad and codfw logstash collectors
  • 14:02 moritzm: rebooting mw1265 for some tests
  • 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:59 cdanis: โœ”๏ธ cdanis@puppetmaster2001.codfw.wmnet ~ ๐Ÿ•™โ˜• sudo touch /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt && sudo chown gitpuppet:gitpuppet /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:24 jbond42: reimage puppetmaster2001
  • 12:37 hashar: Gerrit misbehaved temporarily due to human operator error (hashar ran jstack -l -m which bring the jvm to an halt)
  • 11:16 jbond42: update puppet.ulsfo.wmnet to point to puppetmaster1001
  • 10:45 jbond42: update puppet.esqin.wmnet to point to puppetmaster1001
  • 10:17 moritzm: upgrading ferm on remaining mw servers 2.4.2pre T153468
  • 09:35 moritzm: run systemctl reset-failed on puppetmaster2002 to clear failed puppet-master.service
  • 09:19 moritzm: upgrading ferm on a number of systems to 2.4.2pre T153468
  • 09:07 vgutierrez: restarting acme-chief on acmechief1001 to catch up with python3-cryptography upgrades - T234131
  • 09:04 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acme-chief hosts - T234131
  • 09:03 moritzm: rebalancing ganeti/row_B after rolling reboot
  • 08:57 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acmechief-test1001 - T234131
  • 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:00 moritzm: draining ganeti2003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:00 hashar: gerrit: forcing reindex of changes # T233989
  • 06:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:28 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9223 and previous config saved to /var/cache/conftool/dbconfig/20191001-061956-marostegui.json
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:12 mutante: phabricator - upgrading PHP version to 7.2.22 - T230024

2019-09-30

  • 23:28 niharika29@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CentralNotice/resources/infrastructure/: CentralNotice: Replace deprecated editToken with csrfToken - T233538 (duration: 00m 57s)
  • 23:23 AndyRussG: updated fruec from c591bd653b to 18d89675d0
  • 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 21:47 mutante: mw1290 - scap pull to get it in sync with latest deployment - it was down during scap run for T234153
  • 21:42 jforrester@deploy1001: Synchronized robots.txt: Remove old InternetArchive bot rule that's been disabled since 2008 T7582 (duration: 00m 57s)
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T222539 Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)
  • 21:38 James_F: sync failure on mw1290.eqiad.wmnet โ€“ Connection timed out
  • 21:26 mutante: mw1290 - downtimed for onsite work on mgmt, depooled earlier
  • 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 21:08 XioNoX: delete BGP to AS131285 on cr1-eqsin
  • 20:43 arlolra: Updated Parsoid to 1922eb6 (T233459, T230359, T208070)
  • 20:43 arlolra: T208070
  • 20:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6 (duration: 08m 39s)
  • 20:25 arlolra@deploy1001: Started deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6
  • 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f (duration: 05m 55s)
  • 20:00 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f
  • 19:15 XenoRyet: Updated payments-wiki from 5193dcdfa9 to 80dead6444
  • 17:37 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 03m 03s)
  • 17:33 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:24 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:18 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 00m 05s)
  • 17:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:15 twentyafterfour@deploy1001: deploy aborted: fix T234223 (duration: 06m 24s)
  • 17:10 twentyafterfour: deploy failed
  • 17:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:08 twentyafterfour: deploying minor update to phatality to fix T234223
  • 16:35 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:34 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0aa4b4b (duration: 00m 57s)
  • 16:34 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226 (duration: 01m 17s)
  • 16:32 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: 0aa4b4b (duration: 00m 57s)
  • 16:32 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:25 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:49 moritzm: installing console-setup bugfixes from Buster 10.1 point release
  • 15:46 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:46 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:42 moritzm: failover Ganeti master in codfw to ganeti2001
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:29 moritzm: draining ganeti2007 for upcoming reboot (combined kernel/qemu security updates)
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:08 moritzm: draining ganeti2006 for upcoming reboot (combined kernel/qemu security updates)
  • 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:54 moritzm: draining ganeti2005 for upcoming reboot (combined kernel/qemu security updates)
  • 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 kart_: Update cxserver to 2019-09-26-034732-production (T233834, T232674, T233085)
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:29 jbond42: offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322
  • 12:27 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:00 Urbanecm: EU SWAT done #2
  • 12:00 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 3f4f242: New throttle rule for Czech wiki course (T234113) (duration: 00m 56s)
  • 11:57 Urbanecm: Reopen EU SWAT to deploy throttle rule for October 02 (T234113)
  • 11:54 raynor: EU SWAT finished
  • 11:54 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable alternate mobile link for it, nl, ko wikis. (T206497) (duration: 00m 57s)
  • 11:27 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 539517|Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007) (duration: 00m 59s)
  • 11:20 hashar: Restarting Docker on integration-agent-puppet-docker-1001 # T234197
  • 11:08 hashar: Restarting Docker on CI agents to clear out some docker/iptables oddity # T234197
  • 10:48 hashar: CI outage is tracked in https://phabricator.wikimedia.org/T234197
  • 10:42 moritzm: draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates)
  • 10:40 hashar: CI down due to some DNS related failure on the hosts :-\
  • 10:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:30 moritzm: uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups (T153468)
  • 09:11 moritzm: draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates)
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json
  • 08:01 moritzm: installing e2fsprogs security updates on Stretch/Buster
  • 07:56 marostegui: Stop dbstore1003:3311 for troubleshooting
  • 06:47 moritzm: installing exim security updates on buster

2019-09-28

  • 16:28 vgutierrez: restarting acme-chief on acmechief1001

2019-09-27

  • 22:44 mutante: phab2001 - apt-get autoremove - remove unused python and ruby packages
  • 22:36 mutante: phab2001 - upgrade php7.2 packages to 7.2.22 (T230024)
  • 22:03 mutante: webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names
  • 18:22 mutante: mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063)
  • 18:17 mutante: mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063)
  • 16:01 XioNoX: delete BGP to AS34305 on cr2-esams
  • 15:34 elukey: update pcc facts to add new hosts
  • 15:02 moritzm: installing usb.ids update from Buster 10.1 point release
  • 14:45 moritzm: installing ncurses bugfix update from Buster 10.1 point release
  • 14:39 moritzm: installing postgresql-common bugfix update from Buster 10.1 point release
  • 14:32 effie: Disable puppet and reload apache on mw* for 539465 and 539488 - T229792
  • 13:33 marostegui: Set candidate masters in dbctl T234039
  • 13:31 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:29 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:16 moritzm: reimaging auth1002 to buster
  • 13:09 akosiaris: reboot ganeti2001 T233906
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:03 effie: Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7
  • 12:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 moritzm: installing openldap security updates on Buster
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:37 moritzm: killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated)
  • 12:30 moritzm: installing glib2.0 security updates on Buster
  • 12:14 moritzm: reimaging auth2001 to buster
  • 12:06 moritzm: install gnupg2 security update from Buster 10.1 point release
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json
  • 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:02 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for Czech course (T234024) (duration: 00m 59s)
  • 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:06 moritzm: running a few ferm tests on cp1008, puppet disabled
  • 07:36 godog: swift eqiad-prod: remove ms-be1027 - T233289
  • 05:42 XioNoX: remove tcp-mss clamping from cr2-eqiad - T232602
  • 05:30 XioNoX: remove tcp-mss clamping from cr2-eqord - T232602
  • 05:23 XioNoX: remove tcp-mss clamping from cr1-eqiad - T232602
  • 00:53 twentyafterfour: hotfixing phabricator fatal exception refs T233998

2019-09-26

  • 22:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211620 Enable emails for certain notification types by default on officewiki (duration: 00m 56s)
  • 22:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s)
  • 22:02 jforrester@deploy1001: Synchronized wmf-config/filebackend.php: T228547 Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s)
  • 21:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228547 Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s)
  • 21:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228547 Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s)
  • 20:53 ejegg: updated fundraising CiviCRM from 52d2a24404 to 6d90d0cf06
  • 19:58 phedenskog@deploy1001: Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s)
  • 19:58 phedenskog@deploy1001: Started deploy [performance/navtiming@1880a79]: Test deploy
  • 19:52 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:52 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:46 phedenskog@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24 refs T220749
  • 19:17 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s)
  • 19:17 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release (test)
  • 19:13 twentyafterfour: preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs T220749
  • 18:45 ayounsi@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s)
  • 18:44 ayounsi@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:35 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s)
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s)
  • 18:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s)
  • 18:29 mforns@deploy1001: Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101)
  • 18:25 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s)
  • 18:25 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:17 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s)
  • 18:15 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s)
  • 18:15 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s)
  • 18:07 ayounsi@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s)
  • 18:06 ayounsi@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:04 mutante: running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for T233654
  • 18:04 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s)
  • 18:04 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:03 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s)
  • 18:02 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s)
  • 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s)
  • 17:49 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s)
  • 17:49 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s)
  • 17:43 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 (duration: 02m 04s)
  • 17:41 ppchelko@deploy1001: Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211
  • 17:35 elukey: run apt-get autoremove on stat* and notebook* to clean up old python2 deps
  • 17:31 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:13 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•งโ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad
  • 17:11 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:08 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:40 papaul: upgrading firmware on scs-c1-codfw
  • 16:37 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•›โ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw
  • 15:56 cdanis: sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams
  • 15:35 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•ฆโ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo
  • 15:15 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•šโ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin
  • 15:06 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s)
  • 15:03 mforns@deploy1001: Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap
  • 15:00 cdanis: dbctl schema migration done T229677
  • 14:47 cdanis: dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades T229677
  • 14:43 cdanis@cumin1001: dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json
  • 14:41 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•ฅโ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin
  • 14:37 cdanis: โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•ฅโ˜• sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster
  • 14:36 cdanis: โœ”๏ธ cdanis@puppetmaster1001.eqiad.wmnet ~ ๐Ÿ•ฅโ˜• sudo apt install python3-conftool
  • 14:19 cdanis: โœ”๏ธ cdanis@install1002.wikimedia.org ~ ๐Ÿ•ฅโ˜• sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes
  • 14:16 cdanis: โœ”๏ธ cdanis@install1002.wikimedia.org ~ ๐Ÿ•™โ˜• sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes
  • 11:31 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 (T233922)
  • 11:23 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 3/3) (duration: 01m 05s)
  • 11:14 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104)
  • 11:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 2/3) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7645e55: Enable reader demographic surveys in English, Polish, and Russian (T232525) (duration: 01m 06s)
  • 11:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 1/3) (duration: 01m 08s)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 jbond42: reimagaing puppetmaster1002 to buster
  • 10:48 vgutierrez: switching from nginx to ats-tls on cp5007 - T231627
  • 09:55 moritzm: bouncing postgres on puppetdb1002/2002
  • 09:18 vgutierrez: switching from nginx to ats-tls on cp1080 - T231433
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json
  • 09:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 (duration: 21m 32s)
  • 09:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:47 vgutierrez: switching from nginx to ats-tls on cp2008 - T231433
  • 08:43 mobrovac@deploy1001: Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json
  • 08:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from 1 to 100 - T231018', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json
  • 08:13 vgutierrez: switching from nginx to ats-tls on cp3036 - T231433
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json
  • 08:07 elukey: executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json
  • 08:02 marostegui: Depool db1078 to restart mysql to change its binlog format to ROW
  • 07:57 vgutierrez: switching from nginx to ats-tls on cp4023 - T231433
  • 07:49 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:42 moritzm: draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates)
  • 07:41 vgutierrez: switching from nginx to ats-tls on cp5003 - T231433
  • 07:10 marostegui: Power off db1114 for mainboard replacement T229452
  • 07:09 marostegui: Stop mysql on db1114 for mainboard replacement - T229452
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Sanitize nqowiki on db1124:3313 and db2094:3313 - T230543
  • 06:39 marostegui: Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give some API weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T230784', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T230784', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json
  • 05:00 marostegui: Starting s4 failover from db1081 to db1138 - T230784
  • 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T230784', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json
  • 04:10 marostegui: Start pre-switchover s4 steps T230784

2019-09-25

  • 21:59 bblack: remove GRE MTU hacks on archiva1001 gerrit2001 cobalt install1002 - T232602
  • 21:58 bblack: remove GRE MTU hacks on eqiad caches (cp1xxx) - T232602
  • 21:57 bblack: remove GRE MTU hacks on esams caches (cp3xxx) - T232602
  • 21:56 bblack: remove GRE MTU hacks on eqsin caches (cp5xxx) - T232602
  • 21:10 AndyRussG: update fruec from 97128874bf to c591bd653b
  • 21:00 ejegg: updated fundraising internal dashboard from 4473c65af0 to 69fdbec60d
  • 20:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286) (duration: 05m 32s)
  • 20:20 hashar: Upgrading CI Jenkins
  • 20:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286)
  • 19:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.24 refs T220749 (duration: 01m 03s)
  • 19:27 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.24 refs T220749
  • 18:24 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: trying again (duration: 03m 31s)
  • 18:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: trying again
  • 18:19 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: deploy for version 5.6.15 (duration: 00m 50s)
  • 18:19 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: deploy for version 5.6.15
  • 18:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: Deploy phatality (duration: 00m 24s)
  • 18:13 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: Deploy phatality
  • 18:11 Amir1: creating nqowiki is finished now
  • 18:10 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 18:07 ladsgroup@deploy1001: Synchronized dblists/rtl.dblist: Create nqowiki T230359 (duration: 01m 05s)
  • 18:01 Amir1: creating nqowiki is going to take five more minutes
  • 17:57 ladsgroup@deploy1001: Synchronized langlist: Create nqowiki T230359 (duration: 01m 02s)
  • 17:56 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Create nqowiki T230359 (duration: 01m 05s)
  • 17:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create nqowiki T230359 (duration: 01m 04s)
  • 17:51 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:47 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 04s)
  • 17:29 mutante: DNS - adding nqo (N'Ko) to langlist for new nqo.wikipedia, approved by langcom https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_N'Ko (T230359)
  • 17:11 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 05s)
  • 17:08 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 04s)
  • 16:19 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey for euwiki (T233063) (duration: 01m 04s)
  • 16:06 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 537628|Fix incorrect channel name for TranslationNotifications extension (T144780) (duration: 01m 06s)
  • 15:38 moritzm: installing php5 security updates
  • 15:07 moritzm: imported jenkins 2.176.4 for jessie/stretch T233214
  • 14:57 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:57 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:55 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/Wikibase/view/lib/resources.php: Revert "Merge valueview modules": T233800 (duration: 01m 04s)
  • 14:53 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Draft namespace aliases (T233770) (duration: 01m 04s)
  • 14:52 onimisionipe: pool wdqs1005 - lag issues have minimized.
  • 14:38 moritzm: restarting apache on analytics-tool/an-tool to pick up Expat security update
  • 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:34 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: restarting apache on grafana1001 to pick up Expat security update
  • 14:14 moritzm: restarting apache on various services to pick up Expat security update (releases, netmon, miscweb, graphite, planet,puppetboard)
  • 14:02 marostegui: Deploy schema change on db2086:3318
  • 14:00 effie: Rolling restart thumbor for expat updat
  • 13:55 moritzm: rolling restart of apache on webperf* to pick up Expat security update
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9183 and previous config saved to /var/cache/conftool/dbconfig/20190925-135317-marostegui.json
  • 13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:51 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:45 _joe_: restarting trafficserver on cp1075 to pick up the change
  • 13:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230817 Remove origin trials config (duration: 01m 05s)
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9182 and previous config saved to /var/cache/conftool/dbconfig/20190925-133146-marostegui.json
  • 13:31 moritzm: installing remaining expat security updates
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9181 and previous config saved to /var/cache/conftool/dbconfig/20190925-132147-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9180 and previous config saved to /var/cache/conftool/dbconfig/20190925-131149-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after replacing its BBU', diff saved to https://phabricator.wikimedia.org/P9179 and previous config saved to /var/cache/conftool/dbconfig/20190925-130613-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9178 and previous config saved to /var/cache/conftool/dbconfig/20190925-125601-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): ' Depool for schema change on the logging table: db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9177 and previous config saved to /var/cache/conftool/dbconfig/20190925-125140-marostegui.json
  • 12:47 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:47 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:46 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 marostegui: Repool labsdb1011 T233766
  • 12:41 marostegui: Shutdown db1075 for onsite maintenance T233534
  • 12:37 marostegui: Stop MySQL on db1075 for BBU replacement T233534
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for BBU replacement T233534', diff saved to https://phabricator.wikimedia.org/P9176 and previous config saved to /var/cache/conftool/dbconfig/20190925-123736-marostegui.json
  • 12:34 onimisionipe: depool wdqs1005 to allow it catch up on lag
  • 12:32 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 12:28 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 12:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286) (duration: 05m 17s)
  • 12:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286)
  • 12:05 akosiaris: depool kubernetes1001 and disable puppet on it for rsyslog mmkubernetes testing
  • 12:05 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.*
  • 11:57 vgutierrez: switch cp1078 from nginx to ats-tls - T231433
  • 11:37 vgutierrez: switch cp2005 from nginx to ats-tls - T231433
  • 11:29 onimisionipe: restarted wdqs-blazegraph on wdqs1005
  • 11:15 onimisionipe: repooled wdqs1004 to reduce load on the wdqs public cluster
  • 11:15 Urbanecm: EU SWAT done
  • 11:13 vgutierrez: switch cp3035 from nginx to ats-tls - T231433
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 127485c: Fully close bgwikinews (T233322) (duration: 01m 06s)
  • 10:48 vgutierrez: Switch from nginx to ats-tls on cp4022 - T231433
  • 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:27 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 16s)
  • 10:26 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:26 vgutierrez: switch cp5002 from nginx to ats-tls - T231433
  • 10:25 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 12s)
  • 10:25 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:22 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 42s)
  • 10:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 45m 54s)
  • 09:51 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'codfw' .
  • 09:27 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:20 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 02m 24s)
  • 09:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:16 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 54s)
  • 09:15 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 09:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:01 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 08:52 godog: roll-restart kibana
  • 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 05s)
  • 08:48 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 09m 26s)
  • 08:44 vgutierrez: repooling cp4027 - T233667
  • 08:39 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 07:51 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 revert: [cirrus] temp disable sanity check (duration: 01m 05s)
  • 07:38 moritzm: installing emacs updates for buster (from SUA update, extended ELPA repository key)
  • 07:28 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 04s)
  • 07:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 16s)
  • 07:17 onimisionipe: pool wdqs1005 to allow depooling wdqs1004 to handle lag issues
  • 07:17 elukey: allow analytics users to log in into stat1005
  • 06:33 _joe_: restarting pybal on all low-traffic lbs
  • 06:29 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 06:29 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 06:21 marostegui: Deploy schema change on db2085:3311 T233625
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9171 and previous config saved to /var/cache/conftool/dbconfig/20190925-062036-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:06 marostegui: Run a data check on labsdb1011 - T233766
  • 04:43 marostegui: Deploy schema change on s3 with replication - T231172
  • 03:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.24 refs T220749
  • 03:03 krinkle@deploy1001: Synchronized docroot/noc/: c7c6c0ee0, 8405bf1c2 (duration: 01m 05s)
  • 03:01 krinkle@deploy1001: Synchronized src/: c7c6c0ee0, 8405bf1c2 (for noc.wm.o) (duration: 01m 09s)
  • 02:58 twentyafterfour: belatedly promoting wmf.24 to group0 refs T220749
  • 02:32 onimisionipe: depool wdqs1005 to let it catch up with lag
  • 02:30 onimisionipe: pool wdqs1006 - it has caught up with lag
  • 01:16 mutante: stat1007 - restart nagios-nrpe-server, echo "please don't use all of the RAM on this server" | wall
  • 01:14 krinkle@deploy1001: Synchronized wmf-config/: 3373247e12 (duration: 01m 04s)
  • 01:12 krinkle@deploy1001: Synchronized src/WmfClusters.php: 3373247e123b (duration: 01m 04s)
  • 01:08 krinkle@deploy1001: Synchronized tests: 3373247e123b5 (duration: 01m 04s)
  • 01:07 krinkle@deploy1001: Synchronized docroot/noc: 3373247e123b53 and 1efc8bd (duration: 01m 05s)
  • 01:03 krinkle@deploy1001: Synchronized README: 3373247e123b53 (duration: 01m 04s)
  • 01:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3373247e123b53 - create new file (duration: 01m 05s)
  • 00:47 krinkle@deploy1001: Synchronized wmf-config/: 6dca83a9f6c2c (duration: 01m 04s)
  • 00:44 krinkle@deploy1001: Synchronized docroot/noc/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:43 krinkle@deploy1001: Synchronized tests/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:02 mutante: cp1075 - systemctl restart vhtcpd
  • 00:02 mutante: cp1075 - systemctl status vhtcpd

2019-09-24

  • 23:38 mutante: gerrit service restart to switch LDAP backend
  • 23:35 bstorm_: wiki-replicas depooled labsdb1011
  • 23:33 mutante: gerrit2001 - restarting gerrit service
  • 23:30 mutante: switching LDAP servers used by Gerrit to readonly replicas. stop using so called "labs" config for LDAP backend.
  • 22:26 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.24 refs T220749 (duration: 40m 38s)
  • 21:53 mutante: restbase1024 - enable IPMI over LAN which wasn't working before
  • 21:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.24 refs T220749
  • 21:19 mutante: ganeti4001 - racadm racreset - attempt to fix IPMI
  • 20:19 twentyafterfour: restarting gerrit due to unreasonably high garbage collection times and sluggish performance in general.
  • 19:39 XioNoX: disable asw2-d-eqiad:ge-5/0/41 excessive flapping
  • 19:28 ejegg: updated payments-wiki from 939b771800 to 5193dcdfa9
  • 19:20 twentyafterfour: branching 1.34.0-wmf.24 refs T220749
  • 18:45 AndyRussG: updated fruec from fb29cb74 to 97128874bf
  • 18:08 ejegg: updated Fundraising CiviCRM feca96a2e3 to 52d2a24404
  • 17:13 cstone: civicrm revision changed from 5def62ab05 to feca96a2e3
  • 14:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:28 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:17 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 moritzm: rebooting cloudvirt1021 for kernel update
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 13:50 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:50 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:49 jbond42__: promote puppetmaster1003 to a real puppetmaster backend https://gerrit.wikimedia.org/r/c/operations/puppet/+/538686
  • 13:45 _joe_: installing the new conftool version on the cumin hosts
  • 13:40 _joe_: uploaded conftool 1.1.4-3 to stretch-wikimedia, T233679
  • 13:19 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 13:18 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 12:22 arturo: remove systemd-sysv from jessie-wikimedia/openstack-mitaka-jessie in install1002 (T231793)
  • 12:20 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 [cirrus] temp disable sanity check (duration: 00m 55s)
  • 12:18 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 12:16 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455
  • 11:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 11:45 mobrovac@deploy1001: Started deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455
  • 11:43 Urbanecm: EU SWAT done
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 11a48f8: Add support for some languages on Commons and stop support for nys on Wikidata (T230480) (duration: 00m 56s)
  • 11:39 Urbanecm: Run mwscript initSiteStats.php --wiki=napwikisource --update (T233673)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 9eaa4f8: Set wgArticleCountMethod to any for napwikisource (T233673) (duration: 00m 56s)
  • 11:30 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/mxwikimedia.png (T233670)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: b6947c5: Follow-up 8f3f0705baed: add missing namespace for eswiki (T233562) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MassMessage/: SWAT: ba9b209: Provide deduplication info to MassMessageJob (T232379) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1001: Synchronized static/images/project-logos/mxwikimedia.png: SWAT: 246b352: Update logo for mx.wikimedia (T233670) (duration: 00m 54s)
  • 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.less: SWAT: d4c64a7: Fix broken display of mobile overlay headings (T233163) (duration: 00m 57s)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8bf6aae: Enable alternate mobile link for ar,zh,hi wikis (T206497) (duration: 00m 54s)
  • 11:10 _joe_: all wikis (including API) are now served by PHP7 T219150
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a14b772: FileImporter: limited default deployment (2/2; T232539) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8a89652: FileImporter: limited default deployment (1/2; T232539) (duration: 01m 03s)
  • 10:56 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584 (duration: 01m 00s)
  • 10:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584
  • 10:54 _joe_: converting all appservers to php7, T219150
  • 10:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953 (duration: 22m 20s)
  • 10:50 _joe_: converting mw1261 to full-php7
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953
  • 10:12 marostegui: Deploy schema change on s7 (centralauth and wikis) master with replication - T231172
  • 10:03 marostegui: Deploy schema change on s1 master with replication - T231172
  • 09:58 marostegui: Deploy schema change on labswiki (wikitech) and labtestwiki T231172
  • 09:51 effie: Upgrade to php 7.2.22 on mwmaint* - T230024
  • 09:30 marostegui: Deploy schema change on s2 master with replication - T231172
  • 09:26 effie: Upgrade to php 7.2.22 on deploy* - T230024
  • 09:14 marostegui: Drop table archive_save on frwiki T233187
  • 08:43 marostegui: Deploy schema change on s8 master with replication - T231172
  • 08:37 mvolz@deploy1001: scap-helm zotero finished
  • 08:37 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 08:37 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:36 jynus: stop db1114 mariadb process for some time
  • 08:33 moritzm: installed expat security updates on remaining mw* servers
  • 08:33 mvolz@deploy1001: scap-helm zotero finished
  • 08:32 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:30 marostegui: Deploy schema change on s4 master with replication - T231172
  • 08:29 effie: Disable puppet on api cluster and restart php-fpm to finish php7 migration - T219150
  • 08:19 mvolz@deploy1001: scap-helm zotero finished
  • 08:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 08:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 08:18 marostegui: Deploy schema change on s5 master with replication - T231172
  • 07:51 onimisionipe: depool wdqs1006 to clear HTTP too many request error
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 moritzm: uploaded openjdk-8 8u222-b10-1~deb10u2 to buster-wikimedia component/jdk8 T233604
  • 07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 godog: swift eqiad-prod: continue ms-be1027 decom T233289
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:37 marostegui: Stop MySQL on db1066 - T233071
  • 06:36 marostegui: Remove db1066 from tendril and zarcillo T233071
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075', diff saved to https://phabricator.wikimedia.org/P9163 and previous config saved to /var/cache/conftool/dbconfig/20190924-063002-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9162 and previous config saved to /var/cache/conftool/dbconfig/20190924-061943-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9161 and previous config saved to /var/cache/conftool/dbconfig/20190924-053919-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1075', diff saved to https://phabricator.wikimedia.org/P9160 and previous config saved to /var/cache/conftool/dbconfig/20190924-052545-marostegui.json
  • 05:13 cdanis@cumin1001: dbctl commit (dc=all): 're-do T230783 master promotion and set read-write', diff saved to https://phabricator.wikimedia.org/P9159 and previous config saved to /var/cache/conftool/dbconfig/20190924-051307-cdanis.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 master and remove read-only from s3 T230783', diff saved to https://phabricator.wikimedia.org/P9158 and previous config saved to /var/cache/conftool/dbconfig/20190924-051147-marostegui.json
  • 05:10 cdanis: T230783 mark DEFAULT not s3 as readonly in etcd etcd dbconfig data
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 as read-only for maintenance T230783', diff saved to https://phabricator.wikimedia.org/P9157 and previous config saved to /var/cache/conftool/dbconfig/20190924-050034-marostegui.json
  • 05:00 marostegui: Starting s3 failover from db1075 to db1123 - T230783
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1123 T230783', diff saved to https://phabricator.wikimedia.org/P9156 and previous config saved to /var/cache/conftool/dbconfig/20190924-042121-marostegui.json
  • 04:13 marostegui: Start pre switchover steps - T230783
  • 03:52 chaomodus: rebooted netboxdb[12]001 for kernel upgrade
  • 03:46 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:45 crusnov@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:43 mutante: db2060 - remove PXE flag boot override - set Boot Device to none

2019-09-23

  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:50 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:50 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:43 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:32 catrope@deploy1001: Synchronized wmf-config/VariantSettings.php: Syncing no-op change for T232419 (duration: 00m 57s)
  • 19:57 cdanis: T233657 โœ”๏ธ cdanis@cp4027.ulsfo.wmnet ~ ๐Ÿ•“๐Ÿต sudo -i depool
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 2a7a125: Redefine hiwikisource extra namespaces (T233365) (duration: 00m 57s)
  • 19:09 Urbanecm: Going to deploy one more last-time patch
  • 18:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 (T233610) (duration: 00m 56s)
  • 18:48 Urbanecm: Morning SWAT done
  • 18:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 37fcbdf: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s)
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: be2f9d4: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 55s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: d397f5f: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 56s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8f3f070: Disallow indexing discussion and user pages on eswiki (T233562) (duration: 00m 56s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6cb2042: New throttle rule for Wikimedia Chile editathon (T233378) (duration: 00m 56s)
  • 18:13 Urbanecm: Security deploy for T207094
  • 18:03 gilles: T233095 Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose
  • 17:59 gilles@deploy1001: Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 00m 56s)
  • 17:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config (T233610) (duration: 00m 58s)
  • 16:53 elukey@deploy1001: Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s)
  • 16:46 elukey@deploy1001: Started deploy [analytics/refinery@b99647e]: (no justification provided)
  • 16:33 Urbanecm: Remove my temporary adminship on bgwikinews (T233322)
  • 16:29 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 2/2) (duration: 00m 56s)
  • 16:27 urbanecm@deploy1001: Synchronized dblists/closed.dblist: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 1/2) (duration: 00m 58s)
  • 16:26 Urbanecm: mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary (T233322)
  • 13:21 moritzm: installing qemu security updates on remaining cloudvirt hosts
  • 12:40 moritzm: rolling restart of graphoid on scb to pick up expat security update
  • 12:05 moritzm: restarting apache on bast5001 to pick up expat security update
  • 11:50 moritzm: restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update
  • 11:42 vgutierrez: switching cp4027 from nginx to ats-tls - T231627
  • 11:35 moritzm: installing expat security updates
  • 11:33 awight: EU SWAT finished
  • 11:31 awight@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: Add change tags to all FileImport text revisions (T227849) (duration: 00m 57s)
  • 11:23 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Set item terms on write both up to Q40Mio (T225055) (duration: 00m 55s)
  • 11:12 effie: Disable puppet and rolling restart of php7.2-fpm on mw[1321-1333] - T219150
  • 11:11 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 56s)
  • 11:06 awight@deploy1001: Synchronized static/images/project-logos: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 57s)
  • 11:05 moritzm: uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) T233604
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:51 jynus: stopping db2102 mariadb to recover db
  • 09:45 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'ู†ุนู†ูˆุนู‡' 'ู…ุฑูŠุงู†ุง_ุนู„ูŠ' (T233585)
  • 09:44 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' (T233585)
  • 09:38 akosiaris: T218184 upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1
  • 09:02 effie: Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - T219150
  • 08:31 elukey@deploy1001: Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s)
  • 08:24 elukey@deploy1001: Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json
  • 07:41 godog: swift run swiftrepl without deletes eqiad -> codfw
  • 07:40 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json
  • 07:08 marostegui: Stop MySQL on db1123 to reboot to change binlog format and kernel - T230783
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to change binlog format T230783', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1066 from config T233071 (duration: 00m 56s)
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1066 from config T233071 (duration: 01m 15s)

2019-09-22

  • off: marostegui set s3 master RW

2019-09-21

  • 05:42 shdubsh: re-enable input-kafka-rsyslog-shipper in codfw
  • 05:33 shdubsh: drop input-kafka-rsyslog-shipper in codfw
  • 02:15 bblack: dbproxy1017: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 02:14 bblack: dbproxy1016: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 01:52 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash2004-5-6
  • 01:34 mutante: restarting mobileapps service on scb*
  • 01:34 mutante: restarted mobileapps service on scb1001
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 01:21 bblack: re-pooling cp108[78] in D2 via confctl
  • 01:14 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash1007
  • 01:08 shdubsh: removed input-kafka-rsyslog-shipper-eqiad/codfw from logstash inputs logstash1008 and logstash1009
  • 00:54 mutante: aqs1009 - systemctl restart aqs
  • 00:54 mutante: aqs1006 - systemctl restart aqs
  • 00:48 mutante: aqs1005 - systemctl restart aqs
  • 00:46 shdubsh: restarting logstash on logstash1008 without udp-localhost-eqiad/codfw configs
  • 00:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1088.eqiad.wmnet
  • 00:38 bblack: depooling confctl things in rack D2
  • 00:38 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2019-09-20

  • 21:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: fix T233453 (duration: 00m 56s)
  • 21:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: fix T233453 (duration: 00m 58s)
  • 19:26 XioNoX: update eqsin firewall filters - T233268
  • 16:35 krinkle@deploy1001: Synchronized vendor/: ead70240892e9 (duration: 00m 59s)
  • 16:14 XioNoX: update eqiad firewall filters - T233268
  • 16:11 XioNoX: update esams firewall filters - T233268
  • 15:17 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bgwiki --logwiki=metawiki 'Newrdkter' 'NRdk' (T233313)
  • 15:03 XioNoX: remove AS-PATH prepending in ams
  • 11:29 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:16 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:15 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 09:31 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:52 jynus: creating new database on m1 "bacula9" T229209
  • 08:28 hashar: Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390
  • 08:23 hashar: CI in default since it is somehow no more able to fetch from Gerrit T233390
  • 08:20 hashar: contint1001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 08:12 hashar: contint2001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:46 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:45 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:14 godog: eqiad-prod: start ms-be1027 decom - T233289
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic T223151', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json
  • 05:27 marostegui: Analyze table enwiki.logging on db2102 - T223151
  • 05:07 marostegui: Remove temporary index on hiwikisource views T219374
  • 01:06 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC (duration: 02m 51s)
  • 01:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/TimedMediaHandler/: T233360 Fix Safari 13.0 regression in video playback with audio (duration: 00m 58s)
  • 01:03 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC

2019-09-19

  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 ejegg: updated payments-wiki from adef0e858f to 939b771800
  • 22:34 mutante: gerrit1001 - stopping puppet, removing gerrit IP from interface, rebooting
  • 21:37 niharika29@deploy1001: Synchronized wmf-config/VariantSettings.php: Enable special:mute on testwiki; T231577 (duration: 00m 56s)
  • 20:15 XioNoX: push firewall policies to pfw3-eqiad - T233325
  • 20:07 XioNoX: push firewall policies to pfw3-codfw - T233325
  • 19:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.23 refs T220748
  • 19:02 twentyafterfour: There are currently no blockers for T220748 so I am preparing to deploy 1.34.0-wmf.23 to all wikis.
  • 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 18:14 XioNoX: add TCP-MSS 1436 to cr2-eqiad external interfaces - T232602
  • 18:12 XioNoX: add TCP-MSS 1436 to cr1-eqiad external interfaces - T232602
  • 18:01 bblack: lvs2004 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:55 mutante: puppetmaster1001 - add mcrouter cert for mw1298.eqiad.wmnet (T192457)
  • 17:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 17:48 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, 32cf50453cd (duration: 01m 04s)
  • 17:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2 (duration: 08m 52s)
  • 17:43 Krinkle: Move whisper/MediaWiki/wanobjectcache/revision_row_1/29 to whisper/MediaWiki/wanobjectcache/revision_row_1_29 on graphite1004 and graphite2003 (T232907)
  • 17:38 arlolra@deploy1001: Started deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2
  • 17:27 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:27 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/includes/libs/objectcache/wancache: 2e910c9, T232907 (duration: 01m 03s)
  • 17:23 bblack: lvs2005 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:19 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:16 bblack: lvs200[456] - puppet disabled for https://gerrit.wikimedia.org/r/536324 deploy/test
  • 17:14 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062 (duration: 05m 42s)
  • 17:08 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062
  • 16:31 _joe_: removed manually the purge_checkuser cron from mwmaint1002, to have puppet recreate it
  • 16:20 ejegg: updated fundraising CiviCRM from 90db6cb5a1 to 5def62ab05
  • 16:15 papaul: shutting down scs-a1-codfw for replacement
  • 15:26 moritzm: repooling restbase2012 after completed Cassandra bootstrap T224553
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=restbase,service=cassandra,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-backend,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-ssl,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 15:05 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:56 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286) (duration: 05m 39s)
  • 14:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286)
  • 14:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3 (duration: 10m 42s)
  • 14:37 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3
  • 14:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2 (duration: 08m 24s)
  • 14:31 mobrovac: bootstrap restbase2012-c -- T224553
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2
  • 14:28 mobrovac@deploy1001: deploy aborted: Remove the TID suffix in the ETag, if present - T230272 (duration: 11m 20s)
  • 14:28 sbassett: Deployed security patch for T224203 (php-1.34.0-wmf.23)
  • 14:19 sbassett: Deployed security patch for T224203
  • 14:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 14:18 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:17 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present - T230272
  • 13:54 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750) (duration: 03m 06s)
  • 13:51 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750)
  • 13:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/Translate: T233308 (duration: 01m 07s)
  • 13:14 moritzm: powercycling mw1300
  • 13:12 mobrovac: bootstrap restbase2012-b -- T224553
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1089 into contributions service T223151', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json
  • 13:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553 (duration: 21m 38s)
  • 12:39 mobrovac@deploy1001: Started deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553
  • 12:36 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:48 mobrovac: bootstrap restbase2012-a -- T224553
  • 11:32 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 199a05c: Add new throttle rule for Czech wiki course (T233199) (duration: 01m 01s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: eab7c6a: c80f026: GrowthExperiments: GrowthExperiments: Enable Special:Homepage for euwiki, GrowthExperiments: Enable help panel for euwiki (T233066, T233065) (duration: 01m 05s)
  • 09:54 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: security T207094 (duration: 01m 02s)
  • 09:53 urbanecm@deploy1001: sync-file aborted: security T207094 (duration: 00m 28s)
  • 09:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: security T207094 (duration: 01m 05s)
  • 09:22 godog: power back on ms-be1027, found with power off
  • 08:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 393441b: Change configuration of AbuseFilter extension for enwikisource (T231750) (duration: 01m 04s)
  • 08:22 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: revert T207094 (duration: 01m 04s)
  • 08:20 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: security T207094 (duration: 01m 06s)
  • 08:11 marostegui: Rename tables on db1133:labspuppet T233281
  • 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:40 moritzm: rebooting failoid1001 for kernel update
  • 07:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give more logpager weight to db1089 T223151', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json
  • 07:01 moritzm: reimaging restbase2012 to stretch T224553
  • 06:18 marostegui: Sanitize hiwikisource on db1124:3313 and db2094:3313 T219374
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager T223151', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json
  • 05:11 marostegui: Stop MySQL on db2055 for decommission T233186
  • 05:11 marostegui: Remove db2055 from tendril and zarcillo T233186

2019-09-18

  • 23:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MobileFrontend/resources/dist/: T233260, 1667ed9 (duration: 01m 04s)
  • 22:58 cmjohnson1: enabled asw2-c-eqiad interface xe-2/0/45
  • 22:40 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/resources/Resources.php: d6dadfd (duration: 01m 03s)
  • 22:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, ff44043efa59e9 (duration: 01m 05s)
  • 22:13 cmjohnson1: disabling asw2-c-eqiad xe-2/0/45 - cr1-eqiad to replace optic T233265
  • 21:54 gilles: T233095 Purging all eswiki articles (both desktop and mobile this time)
  • 21:53 gilles@deploy1001: Synchronized php-1.34.0-wmf.22/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 01m 04s)
  • 21:13 XioNoX: enable damping on primary codfw-eqiad link - T196432
  • 21:09 XioNoX: enable damping on codfw-ulsfo link - T196432
  • 20:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No longer load InitialiseSettings at all in CommonSettings (duration: 01m 03s)
  • 20:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Quick fix for wmfLoadInitialiseSettings() (duration: 01m 03s)
  • 20:40 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 20:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out call to InitialiseSettings.php (duration: 01m 04s)
  • 20:18 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Drop suport for serialised PHP (duration: 01m 04s)
  • 20:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP T223602 (duration: 01m 04s)
  • 20:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:07 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208246 Enforce a 10-byte password for privileged users (duration: 01m 04s)
  • 19:57 urandom: decommissioning Cassandra, restbase2012-c -- T224553
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:42 gilles: T233095 Purging all pages on eswiki
  • 19:27 joal@deploy1001: Finished deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix (duration: 03m 40s)
  • 19:24 mutante: ganeti1001 - deleting krypton.eqiad.wmnet - decom T231546
  • 19:23 joal@deploy1001: Started deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.23 refs T220748 (duration: 01m 04s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.23 refs T220748
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:07 twentyafterfour: There appear to be no blockers on T220748 so I'll proceed with deploying 1.34.0-wmf.23 to group 1.
  • 19:01 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix (duration: 02m 12s)
  • 18:59 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix
  • 18:55 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train (duration: 01m 05s)
  • 18:54 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train
  • 18:46 XioNoX: remove `border-in4 term ddos-0906` from all routers
  • 17:53 Amir1: Creating hiwikisource is done
  • 17:50 urandom: decommissioning Cassandra, restbase2012-b -- T224553
  • 17:48 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 32s)
  • 17:45 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Add hiwikisource logos (T218155) (duration: 01m 04s)
  • 17:43 ladsgroup@deploy1001: Synchronized wmf-config/VariantSettings.php: Add hiwikisource (T218155) (duration: 01m 05s)
  • 17:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hiwikisource (T218155) (duration: 01m 04s)
  • 17:38 Amir1: manual write on hiwikisource "wikiadmin@10.64.0.205(hiwikisource)> update text set old_text = 'DB://cluster25/1';" (T218155)
  • 17:33 Amir1: mwscript maintenance/createAndPromote.php --wiki=hiwikisource --force --sysop Ladsgroup (T218155)
  • 17:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:22 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 06s)
  • 17:22 Jeff_Green: authdns-update to deploy DNS for new fundraising host
  • 17:03 mutante: ganeti2004 - resetting DRAC in an attempt to make IPMI work again
  • 17:00 Urbanecm: Morning SWAT done
  • 16:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable DNS blacklist on testwiki temporarily (T230822) (duration: 01m 03s)
  • 16:43 Urbanecm: 8340be9 sync is for T230822, mistakenly inserted `test` instead of the task number
  • 16:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8340be9: Enable logging for BlockManager channel at info level (test) (duration: 01m 04s)
  • 16:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: dc1298d: Add Draft and Draft_talk aliases for wikis that define draft namespace (T223472) (duration: 01m 02s)
  • 16:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 6e59651: Disable FundraiserLandingPage extension on test.wikipedia.org (T203020) (duration: 01m 04s)
  • 16:26 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/tewikisource.png (T232065)
  • 16:25 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 2/2) (duration: 01m 06s)
  • 16:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 1/2) (duration: 01m 05s)
  • 16:18 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 817d679: Turn on EventLogging at 100% for DonateWiki (T233145) (duration: 01m 04s)
  • 16:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: ba30276: Add suppressredirect right to filemovers on bnwiki (T233137) (duration: 01m 05s)
  • 15:55 moritzm: repooling restbase2011 after reimage/bootstrap
  • 15:53 urandom: decommissioning Cassandra, restbase2012-a -- T224553
  • 15:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:59 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-backend
  • 14:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 joal@deploy1001: Finished deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train (duration: 05m 28s)
  • 13:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 joal@deploy1001: Started deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 hashar: Restarting Jenkins, starting Zuul
  • 12:56 marostegui: Deploy schema change on the following s6 hosts: db1088, db1093, db1096, db1098, db1139, dbstore1005 - T231172
  • 12:52 hashar: gracefully stopping Zuul (kill SIGUSR1) to prepare for Jenkins restart
  • 12:40 marostegui: Deploy schema change on s6 codfw master with replication T231172
  • 12:18 vgutierrez: restarting ats-tls to avoid spreading Proxy-Connection header - T233205
  • 12:03 marostegui: Stop haproxy on dbproxy1006 - T233207
  • 11:29 mobrovac: bootstrap restbase2011-c -- T224553
  • 11:27 awight: EU SWAT complete
  • 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 00m 59s)
  • 11:25 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: NowCommons test & test2wiki configuration (T228851) (duration: 01m 15s)
  • 10:17 onimisionipe: force relocation of shards for eqiad search(chi) cluster
  • 10:16 moritzm: restarting postgres on puppetdb1002/2002 after updating permissions for replication user
  • 10:00 mobrovac: bootstrap restbase2011-b -- T224553
  • 09:37 godog: run swiftrepl eqiad -> codfw on all containers, no deletes
  • 09:37 effie: upgrading netmon* to PHP 7.2.22 T230024
  • 09:35 godog: run swiftrepl eqiad -> codfw for transcoded containers
  • 08:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9125 and previous config saved to /var/cache/conftool/dbconfig/20190918-085721-marostegui.json
  • 08:22 mobrovac: bootstrap restbase2011-a -- T224553
  • 07:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 07:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:43 moritzm: reimaging restbase2011 to stretch T224553
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P9124 and previous config saved to /var/cache/conftool/dbconfig/20190918-060401-marostegui.json
  • 05:58 marostegui: Deploy schema change on db2097:3316 - T233135
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool host after onsite checks T233184', diff saved to https://phabricator.wikimedia.org/P9123 and previous config saved to /var/cache/conftool/dbconfig/20190918-054755-marostegui.json
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2055 from config T233186 (duration: 01m 04s)
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2055 from config T233186 (duration: 01m 06s)
  • 05:03 marostegui: Start MySQL on db2127 T233184
  • 03:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.util/: 0333729e, ccfe88241 (duration: 01m 07s)

2019-09-17

  • 23:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.23 refs T220748
  • 23:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/VisualEditor/extension.json: aae62a8 (duration: 01m 05s)
  • 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 22:43 dzahn@cumin1001: Updating IPMI password on 6 hosts - dzahn@cumin1001
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add comment about MinimumPasswordLengthToLogin (duration: 01m 03s)
  • 21:45 cstone: civicrm revision changed from 45dbfdb96f to 90db6cb5a1
  • 21:45 tzatziki: removed one file for legal compliance
  • 21:12 XioNoX: delete AS13335 91.198.174.0/24 RPKI/ROA
  • 21:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 21:10 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 21:10 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:08 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:07 twentyafterfour@deploy1001: Finished scap: testwikis to 1.34.0-wmf.23 refs T220748 (duration: 24m 55s)
  • 21:01 XioNoX: enable interface damping on primary eqiad-esams link (eqiad side) - T196432
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:47 dzahn@cumin1001: Updating IPMI password on 660 hosts - dzahn@cumin1001
  • 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:42 twentyafterfour@deploy1001: Started scap: testwikis to 1.34.0-wmf.23 refs T220748
  • 20:39 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:31 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/phpCharToUpper.json: 8372dcd (duration: 00m 56s)
  • 20:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/Title.js: 8372dcd (duration: 02m 08s)
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 21 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 tzatziki: changing email for User:Olag
  • 20:12 dzahn@cumin1001: Updating IPMI password on 18 hosts - dzahn@cumin1001
  • 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:04 dzahn@cumin1001: Updating IPMI password on 29 hosts - dzahn@cumin1001
  • 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:32 ejegg: updated payments-wiki from fc82318180 to adef0e858f
  • 19:26 dzahn@cumin1001: Updating IPMI password on 543 hosts - dzahn@cumin1001
  • 19:25 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:22 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:20 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:14 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:08 twentyafterfour: Branch cut is in progress for 1.34.0-wmf.23
  • 19:05 urandom: decommissioning Cassandra, restbase2011-c -- T224553
  • 18:06 papaul: upgrading firmware on scs1-a1-codfw
  • 17:18 ejegg: updated SmashPig payments listener from a0151434f4 to dc0c6b208b
  • 17:09 urandom: decommissioning Cassandra, restbase2011-b -- T224553
  • 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 17:00 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 16:04 jbond42: run octocatalog-diff from elnath with current facts
  • 15:55 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 55s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 15:39 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:38 urandom: decommissioning Cassandra, restbase2011-a -- T224553
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Host down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9120 and previous config saved to /var/cache/conftool/dbconfig/20190917-151714-marostegui.json
  • 15:16 marostegui: Stop MySQL on db2127 and shut the host down for onsite maintenance
  • 14:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 14:52 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on wikitech for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 8 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 7 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 6 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 5 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 4 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on remaining section 3 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 2 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 1 wikis for T232464
  • 14:48 anomie@mwmaint1002: Running cleanupRevActorPage.php on test wikis and mediawikiwiki for T232464
  • 14:39 anomie@deploy1001: Synchronized php-1.34.0-wmf.22/includes/MergeHistory.php: Backport MergeHistory fix for T232464 gerrit:537436 (duration: 00m 54s)
  • 14:35 ottomata: bouncing eventstreams service on scb hosts
  • 14:15 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 14:03 herron: migrating kafka1003 to kafka-main1003 T225005
  • 14:00 jbond42: forcing puppet run
  • 14:00 bblack: lvs1015 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:59 bblack: lvs2003 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:57 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:52 bblack: lvs1016 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:52 bblack: lvs2006 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:45 moritzm: repooling restbase2010 after reimage/completed bootstrap
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 db1104 db1085 db1086 after PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9117 and previous config saved to /var/cache/conftool/dbconfig/20190917-132102-marostegui.json
  • 13:17 godog: force-run puppet in eqiad to update exported resources
  • 13:14 jbond42: currently running octocatalog-diff for all hosts from elnath
  • 13:02 marostegui: Start replication on db1130 db1104 db1085 db1086 after PDU maintenance is completed - T227539
  • 13:01 cmjohnson1: The PDU swap in rack B3 eqiad is finished.
  • 12:30 mobrovac: bootstrap restbase2010-c - T224553
  • 11:32 Urbanecm: EU SWAT is done
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:31 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 290e207: Add channels for the Translate and TranslationsNotification extension (T221119, T144780, T143073) (duration: 00m 56s)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:29 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:27 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Use https rather than protcol-relative remote API URLs (T228851) (duration: 00m 58s)
  • 11:24 cmjohnson1: commencing pdu swap rack b3 eqiad T227539
  • 11:22 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Update ORES filter threshold configuration for new huwiki model (T230031) (duration: 00m 55s)
  • 11:17 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable EditorJourney for euwiki (T232061) (duration: 00m 56s)
  • 11:13 Urbanecm: Run mwscript emptyUserGroup.php --wiki=aawiki 'inactive' (T150538)
  • 10:58 mobrovac: bootstrap restbase2010-b - T224553
  • 10:44 vgutierrez: replacing nginx with ATS in cp1076 (upload cluster) - T231433
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9116 and previous config saved to /var/cache/conftool/dbconfig/20190917-094827-marostegui.json
  • 09:46 marostegui: Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539
  • 09:30 hashar: Restarting CI jenkins
  • 09:29 marostegui: Downtime db1073 db1130 db1104 db1085 db1086 for the PDU maintenance T227539
  • 09:18 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 mobrovac: bootstrap restbase2010-a - T224553
  • 09:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 100% of users who accept cookies - T219150 (duration: 00m 57s)
  • 08:37 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp3034 - T231849 T232724
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1074 with just 50 to keep its warmness level just in case T231638', diff saved to https://phabricator.wikimedia.org/P9115 and previous config saved to /var/cache/conftool/dbconfig/20190917-075807-marostegui.json
  • 07:48 effie: Enable puppet on mw*
  • 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates
  • 07:41 marostegui: Stop mysql on db1063 for decommissioning T232564
  • 07:40 marostegui: Remove db1063 from puppet and zarcillo T232564
  • 07:29 vgutierrez: repooling cp5007 without wikibase configuration - T99531
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 vgutierrez: depooling cp5007 to ensure that wikibase removal goes as expected - T99531
  • 07:10 vgutierrez: getting rid of wikibase TLS certificate & nginx configuration on the text cache cluster - T99531
  • 06:56 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp2002, cp4021 and cp5001 - T231849
  • 06:55 vgutierrez: uploaded trafficserver 8.0.5-1wm8 to apt.wikimedia.org (stretch) - T231849
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1066 T233071', diff saved to https://phabricator.wikimedia.org/P9114 and previous config saved to /var/cache/conftool/dbconfig/20190917-065342-marostegui.json
  • 06:49 moritzm: reimage restbase2010 to Stretch T224553
  • 05:57 vgutierrez: upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - T232724
  • 05:56 vgutierrez: uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - T232298 T232724
  • 05:23 effie: disable puppet on mw* servers for 536979
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 master and remove read-only from s2 T230785', diff saved to https://phabricator.wikimedia.org/P9113 and previous config saved to /var/cache/conftool/dbconfig/20190917-050133-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-only for maintenance T230785', diff saved to https://phabricator.wikimedia.org/P9112 and previous config saved to /var/cache/conftool/dbconfig/20190917-050043-marostegui.json
  • 05:00 marostegui: Starting s2 failover from db1066 to db1122 - T230785
  • 04:57 effie: Downtiming HTTPS-blog on icing - T232412
  • 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 and depool it from API T230785', diff saved to https://phabricator.wikimedia.org/P9111 and previous config saved to /var/cache/conftool/dbconfig/20190917-041441-marostegui.json
  • 04:11 marostegui: Start s2 pre-switchover steps T230785
  • 00:34 AndyRussG: updated fruec from fb29cb7407 to 97128874bf

2019-09-16

  • 23:53 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgDebugLogFile in VS (duration: 00m 55s)
  • 23:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgDebugLogFile in CS (duration: 00m 55s)
  • 23:42 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgUploadThumbnailRenderHttpCustom* in VS (duration: 00m 54s)
  • 23:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgUploadThumbnailRenderHttpCustom* in CS (duration: 00m 55s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wmgRC2UDPAddress in VS (duration: 00m 55s)
  • 23:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgRC2UDPAddress in CS (duration: 00m 56s)
  • 23:24 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgCopyUploadProxy in VS (duration: 00m 56s)
  • 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgCopyUploadProxy in CS (duration: 00m 55s)
  • 23:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T225261 T194019 Adjust CentralNotice CSP for banner previews for FR-tech (duration: 00m 55s)
  • 22:59 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use __DIR__ rather than global wmfConfgDir (duration: 00m 55s)
  • 21:48 ebernhardson: unban elastic1027 from production-search-eqiad
  • 20:55 XioNoX: remove 2 sessions to AS12871 on cr2-esams - T232617
  • 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:20 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:10 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:08 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:55 XioNoX: reboot scs-a8-eqiad (at 100% CPU)
  • 19:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:55 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:53 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:51 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:35 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:28 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:27 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:19 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:13 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:13 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:09 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:03 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgCookieSetOnAutoBlock and wgCookieSetOnIpBlock to the default; never varied (duration: 00m 56s)
  • 19:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up globals in InitialiseSettings.php (duration: 00m 56s)
  • 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:01 dzahn@cumin1001: Updating IPMI password on 0 hosts - dzahn@cumin1001
  • 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:54 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 18:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Variant configuration: Read JSON config for all wikis (duration: 00m 56s)
  • 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 56s)
  • 18:40 jforrester@deploy1001: Synchronized src/WmfClusters.php: Use static VariantSettings instead of InitialiseSettings (noc-only change) (duration: 00m 55s)
  • 18:40 mutante: phab1001 - racadm racreset
  • 18:21 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Remove globals declaration and use via GLOBALS for testability (duration: 00m 56s)
  • 18:15 Lucas_WMDE: Morning SWAT done
  • 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: bridge: enable EditTags for beta (T232582) (duration: 00m 58s)
  • 18:12 herron: migrating kafka1002 to kafka-main1002 T225005
  • 18:09 mutante: registry2001 - restarting nginx
  • 17:55 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 55s)
  • 17:49 ejegg: updated SmashPig standalone from 5d187092a7 to a0151434f4
  • 17:42 urandom: decommissioning Cassandra, restbase2010-c -- T224553
  • 17:42 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027 due to >1k orphan tasks
  • 17:09 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 54s)
  • 16:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make CommonSettings use mtime from VariantSettings (duration: 00m 55s)
  • 16:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make InitialiseSettings use values from VariantSettings (duration: 00m 54s)
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Establish VariantSettings.php everywhere (duration: 00m 56s)
  • 16:51 ebernhardson: ban elastic1027 from production-search-eqiad-chi
  • 16:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223602 Inject config object into InitialiseSettings-labs rather than use wgConf global (duration: 00m 55s)
  • 15:42 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 56s)
  • 15:41 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 08s)
  • 15:41 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602
  • 15:10 @: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 15:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:54 urandom: decommissioning Cassandra, restbase2010-b -- T224553
  • 14:37 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:25 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:09 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 13:28 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FlaggedRevs/frontend/specialpages/reports/ValidationStatistics.php: Add missing "use" to getTopReviewers() - T232618 (duration: 00m 55s)
  • 13:10 moritzm: rebooting failoid2001 for kernel update/pick up new qemu
  • 13:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.22
  • 12:59 moritzm: installing qemu security updates on stretch
  • 12:58 urandom: decommissioning Cassandra, restbase2010-a -- T224553
  • 12:44 godog: stop thumbor traffic to statsd/graphite, use Prometheus only and replace Thumbor dashboard - T205870
  • 12:40 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 12:17 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:07 _joe_: rolling restart ended on eqiad T232613
  • 11:56 _joe_: rolling restart of php-fpm in eqiad to pick up the new memcached extension T232613
  • 11:50 _joe_: rolling restart of php-fpm in codfw to pick up the new memcached extension T232613
  • 11:43 Urbanecm: EU SWAT is done
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: e37aed2: Remove expired throttle rules (duration: 01m 03s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 313e3d9: Increase move rate-limit on Commons for all autopatrolled users (T232657) (duration: 01m 05s)
  • 11:33 jbond42: update peer address of AS28598
  • 11:30 effie: Upgrading php-memcached to 3.0.1+2.2.0-1~wmf3
  • 11:30 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Send a User-Agent with remote API requests (T232840) (duration: 01m 02s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 869b56f: Lift IP cap on 2019-10-02 for Senior Citizen Write Wikipedia course - cs.wikipedia (T232831) (duration: 01m 02s)
  • 11:21 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable File Importer source wiki edits on beta cluster (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable source wiki editing for testwiki (T228851) (duration: 01m 02s)
  • 11:10 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Add debug logging for remote API failures (T228851) (duration: 01m 05s)
  • 11:06 _joe_: uploaded php-memcached_3.0.1+2.2.0-1~wmf3 to component/php72 for stretch T232613
  • 10:52 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 10:51 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 10:50 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 10:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 10:45 vgutierrez: Enabling OCSP prefetched responses for the non-canonical redirect service - T232988
  • 10:29 _joe_: installing a patched php-memcached on mw1347 T232613
  • 10:16 vgutierrez: upgrade acme-chief production servers to acme-chief 0.21 - T219765
  • 10:16 moritzm: upload libtrapperkeeper-webserver-jetty9-clojure 1.7.0-2+wmf1 to buster-wikimedia
  • 10:05 vgutierrez: restarting acmechief servers to get latest kernel upgrades
  • 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 vgutierrez: replacing nginx with ATS in cp3034 (upload cluster) - T231433
  • 08:56 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Beta: enable the Parsoid extension - T231569 (duration: 01m 01s)
  • 08:50 marostegui: Apply grants for dbproxy1021 on db1133 (m5 master) with replication - T202367
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 moritzm: installing faad2 security updates
  • 07:15 moritzm: repooling restbase2009
  • 06:48 marostegui: Stop MySQL on db1114 to upgrade it to 10.3
  • 06:04 marostegui: Stop MySQL on db2054 for decommissioning T232969
  • 06:01 marostegui: Remove db2054 from tendril and zarcillo T232969
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2054 from config T232969 (duration: 01m 03s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2054 from config T232969 (duration: 01m 05s)

2019-09-15

  • 16:51 Krinkle: Fixed a dozen abuse filters, listed at https://phabricator.wikimedia.org/T156096#5494060. The trailing pipe character was removed from filters that had it which is no longer supported in a future version of AbuseFilter.
  • 14:35 _joe_: test: setting opcache.interned_strings_buffer to 0 on mw1348 for T232613

2019-09-14

  • 23:42 onimisionipe: force shard allocation (dewiki_content_1566659363[4]) on eqiad cluster
  • 04:39 effie: Depool and reload mw1286
  • 01:14 ejegg: updated fundraising python tools from 1e405864d7 to e1b81688c6
  • 00:29 ejegg: updated payments-wiki from 1f556670cf to fc82318180

2019-09-13

  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 gehel: re-enable puppet on maps - T232817
  • 20:23 chaomodus: restarting netbox1001.wikimedia.org
  • 20:00 twentyafterfour: hotfixing T232600 due to severity of the bug and relative safety of the fix (if this breaks, yell at James_F who twisted my arm and made me do it)
  • 19:54 urandom: bootstrapping Cassandra, restbase2009-c -- T224553
  • 17:24 urandom: bootstrapping Cassandra, restbase2009-b -- T224553
  • 16:10 XioNoX: fix bgp group netflow on cr2-codfw
  • 15:47 urandom: bootstrapping Cassandra, restbase2009-a -- T224553
  • 15:43 effie: reverting live hacks on mw1348
  • 15:34 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable adhoc core dump logging - T232613 (duration: 01m 04s)
  • 15:14 akosiaris: upload apertium-dan_0.6.0-1+wmf3 apertium-nno_1.0.0-1+wmf1 apertium-nob_1.0.0-2+wmf1 apertium-swe_0.8.0-1+wmf1 to apt.wikimedia.org/jessie-wikimedia T218184
  • 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:02 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Add more log and context for T232613 logging - T232613 (duration: 01m 04s)
  • 15:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 moritzm: installing cups security update on buster (only client-side libs installed)
  • 14:22 moritzm: installing bzip2 update from Buster 10.1 point release
  • 14:18 moritzm: installing reportbug update from Buster 10.1 point release
  • 14:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:05 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:57 oblivian@deploy1001: Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s)
  • 13:28 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:27 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:21 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:20 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:19 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:56 _joe_: banning more urls on maps1003
  • 12:37 _joe_: temp ban of class of urls on maps1003 nginx
  • 12:14 jbond42: add timing information to maps1003 access logs
  • 11:39 jbond42: enable access logs on maps1003
  • 11:38 _joe_: manually raising the worker heap limit to 600 MB on kartotherian on maps1003
  • 11:11 elukey: reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades
  • 11:10 elukey: reboot an-tool1007 (runs turnilo) for kernel upgrades
  • 11:08 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 godog: silence kartotherian pages for 2h, known issue
  • 10:47 vgutierrez: rebooting acmechief-test servers to catch up latest kernel upgrades
  • 10:42 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:41 moritzm: reimage restbase2009 to stretch T224553
  • 10:38 moritzm: repool restbase1018 after reimage to stretch and completed Cassandra bootstrap
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:13 vgutierrez: disable ATS-TLS debug options on cp5001 - T232298
  • 10:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 09:46 gehel: re-enabling /geoline on maps1004 - T232817
  • 09:45 @: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:44 @: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:40 godog: install linux-perf-4.9 on maps1002 and attempt to capture a stack sample
  • 09:38 gehel: drop /geoshape and restart kartotherian on maps1004 - T232817
  • 09:27 gehel: restart kartotherian on maps1004 - T232817
  • 09:24 gehel: deny access to /geoline on maps1004 - T232817
  • 09:11 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 09:08 godog: downtime kartotherian pages for 1h in codfw
  • 09:01 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet
  • 09:00 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet
  • 08:57 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:52 godog: downtime kartotherian pages for 1h
  • 08:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 08:48 jmm@cumin2001: Updating IPMI password on 1 hosts - jmm@cumin2001
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:47 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:45 gehel: stop tilerator on maps to help reduce load
  • 08:37 _joe_: rolling restart of karotherian
  • 08:33 _joe_: restarting kartotherian on maps1003, all workers seem stuck
  • 05:58 oblivian@deploy1001: Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s)
  • 05:40 _joe_: live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken
  • 05:17 effie: Rolling restart php-fpm across the fleet for 536400
  • 04:53 vgutierrez: restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849
  • 04:50 eileen: process-control config revision is 43a2677bcf - turned off gender import
  • 02:23 eileen: civicrm revision changed from c5ab5aea9e to 45dbfdb96f, config revision is 1da8391a9a
  • 01:09 XioNoX: add IPv6 sampling to cr1-eqiad
  • 01:07 XioNoX: enable netflow sampling on cr2-codfw

2019-09-12

  • 23:35 XioNoX: enable netflow sampling on cr1-codfw
  • 23:21 urandom: decommissioning Cassandra, restbase2009-b -- T224553
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Read config from JSON, not serialised PHP on testwiki (duration: 01m 03s)
  • 23:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: T223602 Add ability to read config from JSON, not serialised PHP (duration: 01m 04s)
  • 23:10 eileen: process-control config revision is 1da8391a9a
  • 22:53 ayounsi@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:43 ayounsi@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:43 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:20 XenoRyet: payments-wiki updated from 4ebbdb247d to 1f556670cf
  • 22:14 XioNoX: remove extra prepend in AMS-IX
  • 21:18 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Hardcode posix signal and log coredump - T232613 (duration: 01m 04s)
  • 21:17 mbsantos@deploy1001: Finished deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0 (duration: 03m 18s)
  • 21:14 mbsantos@deploy1001: Started deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0
  • 21:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0 (duration: 03m 52s)
  • 21:09 mbsantos@deploy1001: Started deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0
  • 21:00 urandom: decommissioning Cassandra, restbase2009 -- T224553
  • 20:33 krinkle@deploy1001: Synchronized wmf-config/: d495d5e24949 (duration: 01m 03s)
  • 20:28 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: d495d5e24949 (duration: 01m 04s)
  • 20:27 eileen: civicrm revision changed from 4075e396d5 to f00c6482bf, config revision is 635f198b92
  • 20:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only (duration: 01m 02s)
  • 20:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 04s)
  • 20:02 moritzm: installing firmware-nonfree update from Buster 10.1 point release
  • 19:51 moritzm: installing systemd bugfix update from Buster 10.1 point release
  • 19:44 moritzm: installing 4.19.67 kernel from 10.1 point release on Buster systems
  • 19:34 urandom: bootstrapping Cassandra, restbase1018-c -- T224553
  • 18:59 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable coredump on some mysterious php7.2 failure - T232613 (duration: 01m 04s)
  • 18:32 moritzm: installing gdb updates from buster 10.1 point release
  • 18:28 bblack: lvs1016: restart pybal to revert test
  • 18:21 bblack: lvs1016: restart pybal to test dual bgp peering
  • 18:04 bblack: lvs1015: restart pybal to return BGP session to cr2 - T226424
  • 18:03 bblack: lvs1014: restart pybal to return BGP session to cr2 - T226424
  • 17:58 XioNoX: revert VRRP priority change cr2-eqiad - T226424
  • 17:54 XioNoX: revert OSPF priority change on cr2-eqiad - T226424
  • 17:53 XioNoX: re-enabled external BGP on cr2-eqiad - T226424
  • 17:46 urandom: bootstrapping Cassandra, restbase1018-b -- T224553
  • 17:43 XioNoX: reboot cr2-eqiad - T226424
  • 17:40 XioNoX: failover cr2-eqiad master RE from RE1 to RE0 - T226424
  • 17:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: T232613 Add ability to core dump on empty string array key that should exist (wmf.22 only, flagged off) (duration: 01m 03s)
  • 17:31 XioNoX: power off re0.cr2-eqiad - T226424
  • 17:25 XioNoX: failover cr2-eqiad master RE from RE0 to RE1 - T226424
  • 17:19 halfak@deploy1001: Finished deploy [ores/deploy@7d45b80]: T232660 (duration: 13m 41s)
  • 17:05 halfak@deploy1001: Started deploy [ores/deploy@7d45b80]: T232660
  • 17:04 XioNoX: power off re1.cr2-eqiad - T226424
  • 17:02 moritzm: installing unzip security updates on buster
  • 17:00 XioNoX: +1000 metric to all transport to/from cr2-eqiad - T226424
  • 16:57 moritzm: installing libxslt security updates on buster
  • 16:49 XioNoX: Deactivate IX/transit/private-peer v4/v6 BGP on cr2-eqiad - T226424
  • 16:47 moritzm: installing NSS security updates on buster
  • 16:42 XioNoX: er, switch VRRP master to cr1-eqiad - T226424
  • 16:42 XioNoX: switch VRRP master to cr2-eqiad - T226424
  • 16:36 bblack: lvs1013: restart pybal to move bgp session to cr1 - T226424
  • 16:36 bblack: lvs1014: restart pybal to move bgp session to cr1 - T226424
  • 16:35 bblack: lvs1015: restart pybal to move bgp session to cr1 - T226424
  • 16:34 bblack: lvs1016: restart pybal to move bgp session to cr1 - T226424
  • 16:19 XioNoX: rollback force VRRP backup on cr1-eqiad - T226424
  • 16:16 XioNoX: activate CF tunnel on cr1-eqiad - T226424
  • 16:16 XioNoX: activate transit4/6 on cr1-eqiad - T226424
  • 16:09 urandom: bootstrapping Cassandra, restbase1018-a -- T224553
  • 16:04 XioNoX: reboot cr1-eqiad - T226424
  • 16:01 XioNoX: force offline/online of FPC3 on cr1-eqiad
  • 15:45 XioNoX: failover master RE from RE1 to RE0 on cr1-eqiad - T226424
  • 15:39 XioNoX: deactivate transit4/6 on cr1-eqiad - T226424
  • 15:31 XioNoX: shutdown re0.cr1-eqiad - T226424
  • 15:23 XioNoX: failover master RE from RE0 to RE1 on cr1-eqiad - T226424
  • 15:13 XioNoX: shutdown re1.cr1-eqiad - T226424
  • 15:05 XioNoX: disable primary tunnel to CF in eqiad (for real this time, I did see an uptake of traffic on backup link before the rollback)
  • 15:03 XioNoX: rolled back disable primary tunnel to CF in eqiad
  • 15:02 XioNoX: disable primary tunnel to CF in eqiad
  • 14:53 bblack: restart pybal on lvs1013 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:50 bblack: restart pybal on lvs1016 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:45 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:41 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:39 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:37 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:29 XioNoX: ensure cr1-eqiad is vrrp backup for all groups - T226424
  • 13:22 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:57 effie: restarting hhvm on mw1233 and repooling
  • 12:56 effie: depool mw12333
  • 12:38 moritzm: reimaging restbase1018 to stretch
  • 12:03 Amir1: EU SWAT is done
  • 12:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q20mio (T225055) (duration: 01m 31s)
  • 11:11 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:11 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:00 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:42 jynus: compressing tables on labsdb1012 T232446
  • 08:22 vgutierrez: upgrading to acme-chief 0.21 on acmechief-test instances - T219765
  • 08:17 vgutierrez: restarting pybal on lvs1015 and lvs2003 - T176875
  • 08:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wdqs,service=wdqs-heavy-queries
  • 08:11 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=puppetmaster1001.eqiad.wmnet,service=wdqs-heavy-queries
  • 08:07 vgutierrez: restarting pybal on lvs2006 - T176875
  • 08:02 vgutierrez: restarting pybal on lvs1016 - T176875
  • 07:45 vgutierrez: uploaded acme-chief 0.21 to apt.wikimedia.org (buster) - T219765
  • 06:51 vgutierrez: restarting ATS-TLS on cp4021 and cp2002 to get the new SSL session cache size - T232298
  • 06:00 marostegui: Stop MySQL on db1073 for decommission T231892
  • 05:59 marostegui: Remove db1073 from tendril and zarcillo T231892
  • 05:26 _joe_: restarting strongswan on all eqiad caches that need it
  • 05:23 _joe_: restarting strongswan on cp1077
  • 03:37 eileen: civicrm revision changed from 32cd5e4953 to 4075e396d5, config revision is 3e22a80bc8
  • 02:13 eileen: civicrm revision changed from 53aeba6318 to 32cd5e4953, config revision is 3e22a80bc8
  • 02:03 XioNoX: repooling ulsfo

2019-09-11

  • 23:50 ejegg: updated payments-wiki from 5432f9c3a4 to 4ebbdb247d
  • 23:20 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.197` on cr2-eqiad
  • 22:43 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.196` on cr1-eqiad
  • 22:36 XioNoX: add BGP session between cr2-eqord and netflow1001
  • 22:30 urandom: decommissioning Cassandra, restbase1018-c -- T224553
  • 20:57 urandom: bootstrapping Cassandra, restbase-dev1005-b -- T224554
  • 20:21 ottomata: stopped and removed eventlogging-service-eventbus - T232122
  • 20:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@522177f]: Clean up old event style support (duration: 01m 39s)
  • 20:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@522177f]: Clean up old event style support
  • 20:07 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049 (duration: 00m 53s)
  • 20:06 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049
  • 18:43 urandom: decommissioning Cassandra, restbase1018-b -- T224553
  • 18:42 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211124 ed8dd7aad9e5 (duration: 01m 04s)
  • 18:42 nuria@deploy1001: Finished deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect (duration: 08m 39s)
  • 18:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op ed8dd7aad9e5 (duration: 01m 06s)
  • 18:37 krinkle@deploy1001: Synchronized tests/: no-op ed8dd7aad9e5 (duration: 01m 05s)
  • 18:33 nuria@deploy1001: Started deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect
  • 18:16 krinkle@deploy1001: Synchronized wmf-config/logging.php: d6865e3365e8 - T211124 (duration: 01m 04s)
  • 18:16 nuria@deploy1001: Finished deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery (duration: 01m 21s)
  • 18:15 nuria@deploy1001: Started deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery
  • 18:02 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/WikimediaMaintenance/blameStartupRegistry.php: (no justification provided) (duration: 01m 05s)
  • 17:57 XioNoX: upgrade librenms to 1.55
  • 17:43 ayounsi@deploy1001: Finished deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599 (duration: 00m 09s)
  • 17:42 ayounsi@deploy1001: Started deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599
  • 17:32 bblack: enable GRE MTU mitigation on eqsin caches (cp5xxx) - T232602
  • 17:27 bblack: restbase2009 - re-pool - T227408
  • 17:07 bblack: restbase2009 - shutdown for hardware work - T227408
  • 17:05 bblack: restbase2009 - depool for hardware work - T227408
  • 16:57 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c0fd061: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 02s)
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka100[23]
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka-main1001
  • 16:50 bblack: manually removed decommed eventbus LVS IP on kafka-main200[23]
  • 16:49 bblack: manually removed decommed eventbus LVS IP on kafka-main2001
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6007fbc: [rowiki] Allow sysops to remove patrollers (T231099) (duration: 01m 03s)
  • 16:39 urandom: decommissioning Cassandra, restbase1018-a -- T224553
  • 16:38 Urbanecm: Run mwscript emptyUserGroup.php --wiki=fawiki OTRS-member (T232554)
  • 16:36 bblack: ran conftool-merge on puppetmaster1001 (manually from sudo -i, to fixup missing updates)
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 76991f2: Remove OTRS-member usergroup from fawiki (T232554) (duration: 01m 05s)
  • 16:32 Urbanecm: mwscript importImages.php --wiki=commonswiki --user=Abbe98 --comment-ext=txt /home/urbanecm/T232346
  • 16:31 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c45d6d0: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 03s)
  • 16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 565fafa: Set noindex for user and user_talk on zhwiki (T231982) (duration: 01m 05s)
  • 16:24 urandom: bootstrapping Cassandra, restbase-dev1005-a -- T224554
  • 16:16 bblack@cumin1001: conftool action : set/pooled=no; selector: cluster=eventbus
  • 16:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 510aa6b: Add new whitelist rule for Universitรฉ de Lorraine course (T232596) (duration: 01m 04s)
  • 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: eceaccf: Add autopatrolled user group to az.wikibooks (T231493) (duration: 01m 06s)
  • 15:52 bblack: lvs1015 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:51 bblack: lvs2003 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:49 bblack: lvs1016 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:48 bblack: lvs2006 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:03 bblack: downtimed dns-discovery confd health checks for eventbus - T232122
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.22 (duration: 01m 02s)
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.22
  • 12:48 moritzm: upgrade labpuppetmaster* to use facter 3 / puppet 5
  • 12:40 moritzm: removing now obsolete puppet/puppetdb packages from labpuppetmaster* T171188
  • 12:40 moritzm: removing now puppet/puppetdb packages from labpuppetmaster* T171188
  • 11:59 hashar: Restarting Gerrit due to deadlock in the account cache # T224448
  • 11:57 bblack: applying GRE MTU -> MSS fixup to cobalt and gerrit2001 - T218184
  • 11:41 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.21/maintenance/getReplicaServer.php: SWAT: maintenance/getReplicaServer.php: Remove reference to long-deleted config var (T232268) (duration: 01m 04s)
  • 11:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AMC Outreach modal (T231436) (duration: 01m 04s)
  • 11:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q10mio (T225055) (duration: 01m 03s)
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: TR: set WikibaseTaintedReferencesEnabled true on labs wikidatawiki (T232191) (duration: 01m 03s)
  • 10:57 mobrovac: drop the wiktionary definition keyspace - T231361
  • 10:23 moritzm: removed roentgenium/tureis in Ganeti T224559
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:17 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:01 jynus: stopping and upgrading db1074
  • 09:56 jynus: upgrading mariadb client libary on mariadb root clients
  • 09:46 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 50% - T219150 (duration: 01m 03s)
  • 09:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a (duration: 12m 15s)
  • 09:32 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a
  • 09:32 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3 (duration: 13m 18s)
  • 09:19 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3
  • 09:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2 (duration: 03m 59s)
  • 09:13 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2
  • 09:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449 (duration: 03m 24s)
  • 09:08 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449
  • 08:36 mobrovac@deploy1001: Finished deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361 (duration: 02m 13s)
  • 08:34 mobrovac@deploy1001: Started deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361
  • 08:24 mobrovac@deploy1001: Finished deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition (duration: 00m 34s)
  • 08:24 mobrovac@deploy1001: Started deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition
  • 08:22 mobrovac@deploy1001: Finished deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361 (duration: 02m 45s)
  • 08:19 mobrovac@deploy1001: Started deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361
  • 08:13 elukey: add thirdparty/amd-rocm271 to buster-wikimedia and update it with ROCm 2.7.1 packages
  • 08:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:07 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm27 (not used anymore)
  • 08:07 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P9088 and previous config saved to /var/cache/conftool/dbconfig/20190911-080450-marostegui.json
  • 07:52 moritzm: reimaging restbase-dev1005 to Stretch T224554
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9087 and previous config saved to /var/cache/conftool/dbconfig/20190911-075139-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9086 and previous config saved to /var/cache/conftool/dbconfig/20190911-073335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9085 and previous config saved to /var/cache/conftool/dbconfig/20190911-072344-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9084 and previous config saved to /var/cache/conftool/dbconfig/20190911-071450-marostegui.json
  • 07:07 marostegui: Stop MySQL on db1122 to reboot for a kernel upgrade T230785
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to reboot for kernel upgrade T230785', diff saved to https://phabricator.wikimedia.org/P9083 and previous config saved to /var/cache/conftool/dbconfig/20190911-070635-marostegui.json
  • 07:00 hashar: Restarting Gerrit - T224448
  • 06:58 hashar: Restarting Gerrit
  • 06:45 marostegui: Drop unused database puppet on m1 - T231539
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9082 and previous config saved to /var/cache/conftool/dbconfig/20190911-061924-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9081 and previous config saved to /var/cache/conftool/dbconfig/20190911-061659-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2048, will be decommissioned T230106', diff saved to https://phabricator.wikimedia.org/P9080 and previous config saved to /var/cache/conftool/dbconfig/20190911-054855-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P9079 and previous config saved to /var/cache/conftool/dbconfig/20190911-054753-marostegui.json
  • 05:29 marostegui: Switchover s1 codfw master db2048 -> db2112 T230106
  • 03:31 eileen: civicrm revision changed from b343642c76 to 53aeba6318, config revision is 3e22a80bc8

2019-09-10

  • 20:46 ejegg: updated payments-wiki from 15baf7f58b to 5432f9c3a4
  • 20:24 XioNoX: add MSS clamp on install1002 - T2324563
  • 20:20 XioNoX: add MSS clamp on archiva1001 - T232456
  • 18:42 herron: rolling out "Aggregate IPsec Tunnel Statusโ€ icinga check, please disregard for the time being if it alerts
  • 18:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T229863 Remove EventBusRCFeedEngine eventServiceName (duration: 01m 05s)
  • 18:15 XioNoX: rollback test add static route on bast3002 to force advmss
  • 18:10 XioNoX: test add static route on bast3002 to force advmss
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/logging.php: T232042 Direct Parsoid/PHP rt-testing log events to a different target (duration: 01m 02s)
  • 17:56 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: T232122 Stop setting production value for eventlogging-service (duration: 01m 00s)
  • 17:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232122 Remove use of eventlogging-service (duration: 01m 03s)
  • 17:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-sync for safety after scap errored with a broken pipe (duration: 01m 03s)
  • 17:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write to static (JSON) as well as serialised cache for testwiki T223602 (duration: 01m 02s)
  • 17:29 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Be able to write to static (JSON) as well as serialised cache (duration: 01m 03s)
  • 16:35 elukey: reboot analytics-tool1001 via ganeti gnt - not reachable via ssh
  • 16:24 urandom: disabling reserved space on restbase-dev1005:/dev/mapper/restbase--dev1005--vg-srv -- T224554
  • 16:10 marostegui: Failover m1 from db1063 to db1135 - T231403
  • 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set items term store on write both for all of Wikidata" (duration: 01m 02s)
  • 15:58 thcipriani: restarting gerrit (again) https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&from=1568109359163&to=1568130959163&var-Application=&var-Window=30m due to T224448
  • 15:39 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.22
  • 15:37 marostegui: Start pre-switchover for m1 steps T231403
  • 15:35 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: Revert "Improve MultiHttpClient connection concurrency and reuse" - T232487 (duration: 00m 55s)
  • 15:33 reedy@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: T232487 (duration: 00m 55s)
  • 15:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 to 1.34.0-wmf.22 # T220747
  • 14:48 hashar@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:45 akosiaris: repool cp1075 ats-be, releases cert updated
  • 14:44 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 14:44 XioNoX: depool ulsfo for DC UPS power maintenance (see maint-announce)
  • 14:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:32 hashar@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747 (duration: 34m 03s)
  • 14:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:29 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:26 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 ottomata: increasing max_body_size to 10mb for all eventgate services - T232362
  • 14:14 akosiaris: depool cp1075 ats-be to test helmfile sync
  • 14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 13:58 hashar@deploy1001: Started scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747
  • 13:56 hashar: Applied security patches to 1.34.0-wmf.22 # T220747
  • 13:53 hashar: scap prep 1.34.0-wmf.22 # T220747
  • 13:34 elukey: reboot stat1005 to clear incosistent process state after tensorflow tests
  • 13:23 hashar: ./make-wmf-branch -n 1.34.0-wmf.22 -o master -c extensions/CharInsert # T220747
  • 13:12 thcipriani: restarting gerrit
  • 13:11 hashar: Gerrit experimenting difficulty due to ongoing wmf branch cut - T231872
  • 13:01 moritzm: copied prometheus-jmx-exporter to buster-wikimedia (from stretch-wikimedia, just a package with some jars)
  • 12:40 cmjohnson1: the new pdus are racked in b6
  • 12:14 cmjohnson1: removing power from ps1-b6 side B...mgmt should not be affected
  • 11:20 cmjohnson1: swapping the PDU in rack B6 eqiad T227541
  • 11:09 Urbanecm: EU SWAT done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)
  • 11:07 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,dc=eqiad,name=cp1075.eqiad.wmnet
  • 11:06 ema: cp1075: set weight in etcd back to 100
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6afe963: Set items term store on write both for all of Wikidata (T225055) (duration: 00m 55s)
  • 10:51 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:32 vgutierrez: repool cp5001 with ats-tls collecting memory usage details every hour - T232298
  • 09:56 elukey: restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck)
  • 09:50 moritzm: installing ghostscript security updates on jessie
  • 09:37 moritzm: added jbond as chanserv ops for #wikimedia-operations
  • 08:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:42 moritzm: reimaging mw2231 after hardware maintenance T231192
  • 07:21 moritzm: iron.wikimedia.org is no longer a bastion host
  • 06:57 moritzm: upgrading snapshot* to PHP 7.2.22 T230024
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1073 from config T231892 (duration: 00m 54s)
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1073 from config T231892 (duration: 00m 55s)
  • 05:35 marostegui: Stop MySQL on db2047 T231852
  • 05:35 marostegui: Remove db2047 from tendril and zarcillo - T231852
  • 05:33 urandom: decommissioning Cassandra, restbase-dev1005-b -- T224554
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1104 into API T230762', diff saved to https://phabricator.wikimedia.org/P9071 and previous config saved to /var/cache/conftool/dbconfig/20190910-051529-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 master and remove read-only from s8 T227062', diff saved to https://phabricator.wikimedia.org/P9070 and previous config saved to /var/cache/conftool/dbconfig/20190910-050213-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 as read-only for maintenance T230762', diff saved to https://phabricator.wikimedia.org/P9069 and previous config saved to /var/cache/conftool/dbconfig/20190910-050046-marostegui.json
  • 05:00 marostegui: Starting s8 failover from db1104 to db1109 - T227062
  • 04:46 vgutierrez: depool cp5001 for memory leak debugging on ATS - T232298
  • 04:23 marostegui: Start topology changes on s8, connect everything under db1109 - T230762
  • 04:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 and depool it from API T230762', diff saved to https://phabricator.wikimedia.org/P9068 and previous config saved to /var/cache/conftool/dbconfig/20190910-042243-marostegui.json
  • 04:18 marostegui: Start s8 (wikidata) pre switchover steps T230762
  • 00:59 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 00:59 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 00:57 Krinkle: krinkle@deploy1001: Deploy performance/navtiming f2a0863 - T226539
  • 00:41 urandom: decommissioning Cassandra, restbase-dev1005-a -- T224554

2019-09-09

  • 23:44 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/skins/MinervaNeue/: T232260 (duration: 00m 57s)
  • 22:28 ejegg: updated payments-wiki from 51d9ed79b6 to 15baf7f58b
  • 20:50 urandom: bootstrapping Cassandra, restbase-dev1004-b -- T224554
  • 19:48 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 05m 45s)
  • 19:42 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 19:41 mdholloway: mobileapps deployment failed repooling canary (scb2001); retrying
  • 19:40 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 02m 59s)
  • 19:37 XioNoX: fix eqsin CF tunnel missconfig
  • 19:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 17:56 andrewbogott: disabling puppet on labpuppetmaster1001 as part of T171188
  • 17:55 XioNoX: push cloudflare tunnel config to cr1-eqsin
  • 16:50 papaul: replacing Fan kit and power supplies on cr1-codfw
  • 14:22 urandom: bootstrapping Cassandra, restbase-dev1004-a -- T224554
  • 13:51 vgutierrez: upgrading ats to 8.0.5-1wm6 on cp5001 - T232298
  • 13:39 vgutierrez: uploaded trafficserver 8.0.5-1wm6 to apt.wikimedia.org (stretch) - T232298
  • 13:31 moritzm: installing facter update from buster 10.1 point release (T222356)
  • 13:15 moritzm: upgrading labweb/wikitech to PHP 7.2.22 T230024
  • 13:02 Urbanecm: Patch is deployed, deploy1001 should be clear
  • 13:01 moritzm: upgrading remaining mediawiki app servers (mw1266-mw1275) to PHP 7.2.22 T230024
  • 12:55 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/WikibaseMediaInfo/: ubn patch T231276 (duration: 00m 58s)
  • 12:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/Wikibase: ubn patch T231276 (duration: 01m 03s)
  • 12:48 moritzm: upgrading remaining job runners to PHP 7.2.22 T230024
  • 12:44 Urbanecm: EU SWAT wmf patch ongoing, testing with mwdebug1002
  • 12:41 ema: lvs1015 (primary): restart pybal to add service restbase-ssl T210411
  • 12:36 ema: lvs2003 (primary): restart pybal to add service restbase-ssl T210411
  • 12:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=eqiad
  • 12:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=codfw
  • 12:29 elukey: restart archiva again to debug download artifact issue
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase1022.eqiad.wmnet
  • 12:11 Urbanecm: Undeployed patch in wmf branch, will resolve soon
  • 12:01 moritzm: installing ldap-corp1001 T231015
  • 11:32 Urbanecm: Dry run for all wikis (T231137)
  • 11:26 moritzm: installing ldap-corp2001 T231015
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 10:22 effie: jiji@deploy1001:~$ scap sync-file wmf-config/CommonSettings.php "Push PHP7 traffic to 33.3% - T219150"
  • 09:48 moritzm: updated stretch netinst image to 9.11 T232308
  • 09:42 eileen: civicrm revision changed from d1d65f37ea to 516eeb54b5, config revision is 5a6a9c6c03
  • 09:40 moritzm: updated buster netinst image to 10.1 T232310
  • 09:28 ema: lvs1016, lvs2006 (secondaries): restart pybal to add service restbase-ssl T210411
  • 09:02 elukey: restart archiva on archiva1001 - stuck and not serving requests (no trace about why in the logs)
  • 08:55 eileen: civicrm revision is d1d65f37ea, config revision is 5a6a9c6c03
  • 08:38 vgutierrez: disabling systemd hardening for ats-tls on cp5001 - T232298
  • 07:33 moritzm: installing ghostscript security updates
  • 03:53 vgutierrez: reboot analytics-tool1001
  • 02:59 bd808: Testing twitter integration after software update for Stashbot. In theory messages up to 280 characters in length will now be passed through to the @wikimediatech Twitter feed without being truncated. This message should end with a unicorn face if that is correct. ๐Ÿฆ„

2019-09-08

2019-09-06

  • 21:33 cdanis: cdanis@mw1317.eqiad.wmnet ~ ๐Ÿ• ๐Ÿบ sudo -i depool
  • 21:27 James_F: mw1317 seems corrupted (Fatal error: Class undefined: stdClass in /srv/mediawiki/php-1.34.0-wmf.21/includes/libs/rdbms/database/DatabaseMysqli.php); running scap pull
  • 18:01 godog: silence esams pages for 30m
  • 17:43 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux (duration: 02m 55s)
  • 17:40 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux
  • 17:39 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3 (duration: 00m 21s)
  • 17:38 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3
  • 17:26 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 2 (duration: 00m 37s)
  • 17:25 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 2
  • 17:25 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux (duration: 01m 29s)
  • 17:24 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux
  • 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:43 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:38 ema: cp5001: restart trafficserver-tls.service to clear icinga alert after segfault
  • 12:36 moritzm: fix permissions on /var/spool/exim on krypton (hosts used to run the exim heavy role which uses different permissions than the light role)
  • 10:59 onimisionipe: force shard allocation - chi eqiad
  • 10:59 Amir1: ladsgroup@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki (T225056)
  • 10:17 moritzm: installing exim4 security updates
  • 08:43 mutante: webperf* - /usr/local/sbin/build-envoy-config -c /etc/envoy | rm /etc/envoy/listeners.d/00-tls_terminator_443.yaml | run puppet - envoy now listening on 443 (T210411)
  • 07:48 mutante: running puppet on cp-text_eqiad / cp1075 - switching releases.wikimedia.org to TLS to backend
  • 06:29 oblivian@deploy1001: Synchronized README: testing php conditional restarts (duration: 00m 55s)
  • 06:09 mutante: puppetmaster1001 - same for restbase-dev1005 and restbase-dev1006 (T224554)
  • 06:03 mutante: puppetmaster1001 - copying cassandra-ca-manager to /usr/local/bin - deleting expired restbase-dev1004 certs - running cassandra-ca-manager services-dev.yaml T224554
  • 05:31 marostegui: Stop MySQL on db2046 - T231767
  • 05:11 marostegui: Remove db2046 from tendril and zarcillo - T231767
  • 04:54 _joe_: run systemctl reset-failed on kafka1001 to clear a 13 hours icinga alert
  • 03:21 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (duration: 00m 14s)
  • 03:21 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291
  • 03:16 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (testing) (duration: 00m 20s)
  • 03:16 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (testing)
  • 03:07 chaomodus: restarting keyholder on deploy1001
  • 02:34 ejegg: rolled back payments-wiki to 51d9ed79b6
  • 02:25 ejegg: updated payments-wiki (again) from 51d9ed79b6 to 04120169b0... false alarm
  • 02:15 ejegg: payments-wiki rolled back to 51d9ed79b6
  • 02:11 ejegg: updated payments-wiki from 51d9ed79b6 to 04120169b0
  • 01:44 eileen: tools revision changed from 643c48b26a to 1e405864d7
  • 01:18 ayounsi@deploy1001: Finished deploy [netbox/deploy@367ca84]: test (duration: 00m 02s)
  • 01:18 ayounsi@deploy1001: Started deploy [netbox/deploy@367ca84]: test

2019-09-05

  • 23:13 ayounsi@deploy1001: Finished deploy [netbox/deploy@367ca84]: test (duration: 00m 42s)
  • 23:12 ayounsi@deploy1001: Started deploy [netbox/deploy@367ca84]: test
  • 23:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T151425 Require that passwords are not in the most common 100k list for all users (duration: 00m 48s)
  • 22:12 eileen: tools revision changed from b42bda6bf3 to 643c48b26a
  • 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (duration: 00m 03s)
  • 21:42 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291
  • 21:35 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: test deploy for netbox split - again (duration: 00m 12s)
  • 21:34 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: test deploy for netbox split - again
  • 19:28 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: c7678f0e3d638 (duration: 00m 47s)
  • 19:21 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: 7adf466614d (duration: 00m 48s)
  • 18:10 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: test deploy for netbox split (duration: 38m 39s)
  • 17:31 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: test deploy for netbox split
  • 16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all events to eventgate - T228705 - take 2 (duration: 00m 49s)
  • 16:06 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all events to eventgate - T228705 (duration: 00m 48s)
  • 16:04 ottomata: switching remaining job queue events (and all remaining events) to eventgate - T228705
  • 15:59 jynus: restarting batch processes on mwmaint1002 T232106
  • 15:54 jynus@deploy1001: Synchronized private/PrivateSettings.php: updating cli password (duration: 00m 47s)
  • 15:23 herron: beginning replacement of kafka1001 with kafka-main1001 T225005
  • 14:54 ema: restbase2009: repool after successful envoy deployment T210411
  • 14:50 ema: restbase2009: depool and add TLS termination w/ envoy -- https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/533028/ T210411
  • 14:42 XioNoX: remove iron from mr* routers - T231811
  • 14:30 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 14:15 @: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 14:11 cdanis: restarted swiftrepl on ms-fe1005 T231110
  • 13:54 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 13:39 moritzm: upgrading remaining API servers to PHP 7.2.22
  • 13:37 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 13:21 filippo@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=prometheus1003.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.21
  • 12:47 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 12:13 moritzm: upgrading mw1284-mw1290 to PHP 7.2.22
  • 12:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 11:57 moritzm: upgrading remaining job runners to PHP 7.2.22
  • 11:50 dcausse: EU swat done
  • 11:48 dcausse@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/CirrusSearch/: T159321: Add morelikethis a non-greedy version of the morelike keyword (duration: 00m 59s)
  • 10:53 godog: temporarily enable prometheus admin web api in prometheus@ops in eqiad to delete spammy metrics - T228395
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=prometheus1004.eqiad.wmnet
  • 10:46 moritzm: upgrading mw1221-mw1335 to PHP 7.2.22
  • 10:31 moritzm: upgrading mw1319-mw1333 to PHP 7.2.22
  • 10:28 _joe_: upgrading scap across the fleet T224857
  • 10:25 moritzm: upgrading mw1238-mw1258 to PHP 7.2.22
  • 09:39 mutante: ganeti1001 - creating VM moscovium (T232077)
  • 09:26 vgutierrez: rolling back from ats-tls to nginx on cp1076 - T231433
  • 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: Promote wikidatawiki to 1.34.0-wmf.21 for T232035 - T220746
  • 09:04 vgutierrez: rolling back from ats-tls to nginx on cp3034 - T231433
  • 08:55 hashar@deploy1001: rebuilt and synchronized wikiversions files: Rollback wikidatawiki to 1.34.0-wmf.20 for T232035
  • 08:38 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=a.*-ro,name=codfw
  • 08:37 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:35 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 akosiaris: depool restbase1022 T232007
  • 08:30 vgutierrez: rebooting cp3034
  • 08:23 vgutierrez: repooling cp3034
  • 08:21 hashar@deploy1001: rebuilt and synchronized wikiversions files: Promote wikidatawiki to 1.34.0-wmf.21 for T232035 - T220746
  • 08:16 moritzm: reimage restbase-dev1004 to Stretch T224554
  • 08:13 _joe_: upgrading scap on deploy1001
  • 08:09 vgutierrez: depooling cp3034 due to intermittent network issues
  • 07:57 _joe_: upgrading scap on mwdebug1001
  • 07:56 _joe_: uploading scap 3.12.1 to reprepro on all distros 224857
  • 07:56 hashar: Switching "wikidatawiki" on mwdebug1001 to 1.34.0-wmf.21 by editing /srv/mediawiki/wikiversions.php # T232035
  • 07:53 marostegui: Remove old backups for db2037 and db2042 from dbprov2001
  • 07:45 marostegui: Remove puppet grants from m1 for the following IPs: 10.64.0.165 10.64.16.159 10.64.16.18 T231539
  • 07:32 moritzm: upgrading mw1293-mw1296, mw1299-mw1306 to PHP 7.2.22
  • 07:31 mutante: ununpentium - removed /etc/envoy/envoy.yaml; ran /usr/local/sbin/build-envoy-config -c /etc/envoy to regenarate config without 443 listener; ran puppet; envoy now running on jessie
  • 07:07 mutante: ununpentium - manually delete /etc/envoy/listeners.d/00-tls_terminator_443.yaml after changing port to 1443 - puppet does not remove it
  • 06:44 kart_: Updated cxserver to 2019-09-04-065911-production (T213255, T206310)
  • 06:41 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 06:39 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 06:38 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 05:42 marostegui: Remove grants for dbproxy1005 T231280 T231967
  • 05:31 marostegui: Restart MySQL on codfw sanitariums (db1124 and db1125) to pick up new filters - T51195
  • 05:29 marostegui: Restart wikibugs
  • 05:21 mutante: ganeti2005 - DRAC reset fails - ipmi_cmd_cold_reset: bad completion code
  • 05:19 mutante: ganeti2005 - reset DRAC via local IPMI since mgmt stopped responding
  • 05:14 marostegui: Restart MySQL on codfw sanitariums (db2094 and db2095) to pick up new filters - T51195
  • 04:57 vgutierrez: rearming keyholder on cumin1001
  • 04:42 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp4021 - T231433
  • 04:37 vgutierrez: switching cp4021 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp3034 - T231433
  • 04:20 vgutierrez: switching cp3034 from nginx to ats-tls - T231433
  • 04:02 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp1076 - T231433
  • 03:57 vgutierrez: switching cp1076 from nginx to ats-tls - T231433
  • 00:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out write of variant config into MWConfigCacheGenerator, part 2 (duration: 00m 53s)
  • 00:54 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out write of variant config into MWConfigCacheGenerator, part 1 (duration: 00m 56s)
  • 00:04 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out load of variant config into MWConfigCacheGenerator, part 2 (duration: 00m 55s)
  • 00:02 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out load of variant config into MWConfigCacheGenerator, part 1 (duration: 00m 55s)

2019-09-04

  • 23:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out variant config generation into MWConfigCacheGenerator, part 2 (duration: 00m 55s)
  • 23:33 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out variant config generation into MWConfigCacheGenerator, part 1 (duration: 00m 54s)
  • 23:05 urandom: decommission restbase-dev1004-b (Cassandra) -- T224554
  • 21:58 andrewbogott: attached to console on cumin1001, found it in bios 'system settings', exited, allowed boot to continue. No idea how it got there โ€” spontaneous reboot?
  • 21:12 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: (no justification provided) (duration: 08m 55s)
  • 21:03 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: (no justification provided)
  • 20:14 urandom: decommission restbase-dev1004-a (Cassandra) -- T224554
  • 20:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:35 hashar@deploy1001: rebuilt and synchronized wikiversions files: rollback wikidatawiki to 1.34.0-wmf.20 for T232035 - T220746
  • 19:33 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:00 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.21 (duration: 00m 54s)
  • 18:59 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.21
  • 17:59 jforrester@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/: T229271 Homepage: Unbreak question dialogs on mobile (duration: 00m 56s)
  • 17:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 57s)
  • 17:45 jforrester@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 56s)
  • 17:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all non-low-traffic jobs to eventgate - T228705 - take 2 (duration: 00m 55s)
  • 17:34 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all non-low-traffic jobs to eventgate - T228705 (duration: 00m 56s)
  • 17:32 ottomata: Switch all non-low-traffic jobs to eventgate - T228705
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:48 joal@deploy1001: Finished deploy [analytics/refinery@2322f10]: Fix for yesterday regular analytics deploy (duration: 53m 16s)
  • 16:40 Lucas_WMDE: Morning SWAT done
  • 16:38 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/AbuseFilter: SWAT: Fix filter validation in ViewEdit (T231985) (duration: 00m 58s)
  • 16:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 533172|Move ContentTranslation out of Beta in jvwiki (T231207) (duration: 00m 56s)
  • 15:55 joal@deploy1001: Started deploy [analytics/refinery@2322f10]: Fix for yesterday regular analytics deploy
  • 15:36 godog: upgrade grafana to 5.4.5 on labmon
  • 14:51 andrewbogott: reimaging cloudvirt1015 for T220853
  • 14:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove obsoleted DB config from db-eqiad.php T231642 (duration: 00m 57s)
  • 14:08 cdanis: If0dd79604 actually live on canaries now
  • 14:04 cdanis: If0dd79604 deployed to eqiad MW canaries T231642
  • 13:59 moritzm: installing nghttp2 security updates
  • 13:59 cdanis: manually testing If0dd79604 on mwdebug1001
  • 13:47 _joe_: restarting php7.2-fpm across the fleet to pick up the apc.ttl removal
  • 13:20 cdanis@deploy1001: Synchronized wmf-config/db-codfw.php: a8dc4c4a0 db-codfw: remove obsoleted DB config T231642 (duration: 00m 55s)
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 13:17 oblivian@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 13:17 oblivian@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 13:17 oblivian@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 12:56 cdanis: manually testing I1bc6d1603 on mwdebug2002
  • 12:49 gehel: reset kartotherian password on maps slaves - T231964
  • 12:36 gehel: restart kartotherian on maps1001 - T231964
  • 11:52 dcausse: EU SWAT done
  • 11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T231194: [cirrus] Reenable sanity checks (duration: 00m 56s)
  • 11:47 dcausse@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/CirrusSearch/: T159321: Add morelikethis a non-greedy version of the morelike keyword (duration: 00m 57s)
  • 11:47 Amir1: start of ladsgroup@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --to-id 2000000 --sleep 2 > ~/rebuildItemTerms.out 2> rebuildItemTerms.err (T225056). This is going to take a while. On screen
  • 11:38 moritzm: upgrading mw1339-mw1348 to PHP 7.2.22
  • 11:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms migration stage for Wikidata on WRITE_BOTH up to Q2m (T225055) (duration: 00m 55s)
  • 11:32 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add high-density logos for the Incubator (T230122) (duration: 00m 56s)
  • 11:28 ladsgroup@deploy1001: Synchronized static/images/project-logos/incubatorwiki-2x.png: SWAT: Add high-density logos for the Incubator (T230122) Part II (duration: 00m 54s)
  • 11:27 ladsgroup@deploy1001: Synchronized static/images/project-logos/incubatorwiki-1.5x.png: SWAT: Add high-density logos for the Incubator (T230122) Part I (duration: 00m 52s)
  • 11:24 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/project-logos/wikidatawiki-1.5x.png' | mwscript purgeList.php wikidatawiki # T230120
  • 11:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add high-density logos for Wikidata (T230120) (duration: 00m 55s)
  • 11:14 ladsgroup@deploy1001: Synchronized static/images/project-logos/wikidatawiki-2x.png: SWAT: Add high-density logos for Wikidata (T230120) Part II (duration: 00m 56s)
  • 11:12 ladsgroup@deploy1001: Synchronized static/images/project-logos/wikidatawiki-1.5x.png: SWAT: Add high-density logos for Wikidata (T230120) Part I (duration: 00m 56s)
  • 10:42 marostegui: Start event scheduler on db1115 T231769
  • 10:23 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp2002 - T231859
  • 10:20 marostegui: Start MySQL on db1115 without the event scheduler - T231769
  • 10:12 marostegui: Stop MySQL on db1115 without the event scheduler - T231769
  • 10:12 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp5001 - T231859
  • 10:11 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:11 marostegui: Tendril/dbtree will be unavailable for a few minutes T231769
  • 10:11 marostegui: Stop MySQL on db1115 - T231769
  • 10:09 vgutierrez: uploaded trafficserver 8.0.5-1wm5 to apt.wikimedia.org (stretch) - T231533 T231859
  • 09:33 moritzm: upgrading mw servers in codfw to 7.2.22
  • 09:19 _joe_: uploaded envoyproxy to buster
  • 08:56 moritzm: upgrading mw1238-mw1258 to PHP 7.2.22
  • 08:42 marostegui: Stop HAproxy on dbproxy1005 - T231967
  • 08:37 moritzm: upgrading API canaries in eqiad to 7.2.22
  • 08:26 marostegui: Reboot db1135 to pick up new kernel - T231403
  • 07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2047 from config T231852 (duration: 00m 54s)
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2047 from config T231852 (duration: 00m 57s)
  • 07:21 mutante: ununpentium - a2dismod ssl - systemctl restart apache2
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/resources/src/startup/mediawiki.js: 8a1b13026 (duration: 00m 55s)
  • 02:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/resources/src/mediawiki.base/mediawiki.base.js: 8a1b13026 (duration: 00m 56s)
  • 02:21 chaomodus: extending downtime on netmon1002 and netmon2001, netbox1001, netbox2001, netboxdb1001 and netbox2001 should be stable but are still being debugged
  • 01:02 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ed5297c10 / T217830 (duration: 00m 59s)
  • 00:02 chaomodus: installing and setting up netbox instances T223291

2019-09-03

  • 23:57 niharika29@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935 (duration: 00m 55s)
  • 23:56 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935 (duration: 00m 55s)
  • 23:54 niharika29@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 55s)
  • 23:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 56s)
  • 23:42 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/tests/phpunit/: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 57s)
  • 23:41 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/includes/block: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 55s)
  • 23:28 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure ORES damaging and goodfaith on zhwiki T225562 (duration: 00m 58s)
  • 23:10 ebernhardson: production-search-eqiad all indices index.merge.policy.deletes_pct_allowed=20
  • 22:54 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208694 Set CentralNotice's wgNoticeProjects for wikimedia (duration: 00m 59s)
  • 22:45 eileen: process-control config revision is 100334de4a adjust silverpop schedule
  • 19:42 XioNoX: rollback OSPF metric change on eqiad-codfw Zayo link (1320->320)
  • 19:20 fdans@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 19:18 fdans@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 19:14 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch high-traffic jobs to eventgate. Take 2 - T228705 (duration: 00m 56s)
  • 19:12 ottomata: switching jobqueue events to eventgate-main - T228705
  • 18:41 urbanecm@deploy1001: Synchronized wmf-config/: Emergency fix: GE not loading configuration properly: newbie facing feature (duration: 00m 57s)
  • 18:35 Urbanecm: Livetesting on mwdebug1002
  • 17:45 James_F: Pulled I9b64a2bb770 into wmf.21 production on the deploy server; no need to deploy to app-servers, CI-only fix.
  • 17:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21
  • 16:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/Graph/includes/ApiGraph.php: T231894 (duration: 00m 55s)
  • 16:01 joal@deploy1001: Finished deploy [analytics/refinery@8b17711]: Fixes for regualr analytics deploy (duration: 136m 59s)
  • 15:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T227260 (duration: 00m 54s)
  • 15:32 ebernhardson: unban elastic1027 from production-search-eqiad
  • 15:07 hashar@deploy1001: rebuilt and synchronized wikiversions files: testwiki 1.34.0-wmf.21 for T231894 - T220746
  • 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: Rollback group0 to 1.34.0-wmf.21 - T220746
  • 14:45 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21 - T220746
  • 14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Promote db1133 as wikitech master T229657 (duration: 00m 54s)
  • 14:28 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746 (duration: 50m 09s)
  • 14:21 moritzm: upgrading app server canaries to PHP 7.2.22 T230024
  • 13:44 joal@deploy1001: Started deploy [analytics/refinery@8b17711]: Fixes for regualr analytics deploy
  • 13:38 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746
  • 13:26 hashar: Gerrit should be fine again, apparently was due to the wmf branch cut taking too much resources (sic) - T231872 filled to investigate
  • 13:25 hashar: 1.34.0-wmf.21 cut
  • 13:16 hashar: Gerrit has some random times out from time to time (no reason)
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1073 from wikitech T229657', diff saved to https://phabricator.wikimedia.org/P9038 and previous config saved to /var/cache/conftool/dbconfig/20190903-131456-marostegui.json
  • 13:13 marostegui: Re-enable puppet on db1073 and db1133 T229657
  • 13:11 marostegui: Reload haproxy on dbproxy1005 T229657
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech back to RW after maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9037 and previous config saved to /var/cache/conftool/dbconfig/20190903-131000-marostegui.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech as read-only for maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9033 and previous config saved to /var/cache/conftool/dbconfig/20190903-130113-marostegui.json
  • 13:00 marostegui: Failover m5 from db1073 to db1133 - T229657
  • 12:52 moritzm: uploaded PHP 7.2.22 to component/php72 T230024
  • 12:39 moritzm: upgrading mwdebug2001 to PHP 7.2.22
  • 12:29 hashar: Cutting wmf/1.34.0-wmf.21 # T220746
  • 12:19 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.20
  • 12:02 marostegui: Disable puppet on db1073 and db1133 - T229657
  • 11:55 marostegui: Change topology on m5 and make everything replicate from db1133 - T229657
  • 11:48 marostegui: Downtime m5 hosts T229657
  • 11:35 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --to-id 1000 --sleep 2 (T225056)
  • 11:29 Amir1: EU SWAT is done
  • 11:29 Amir1: ladsgroup@mwmaint1002:~$ mwscript namespaceDupes.php bswiki --fix (T231654)
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix wgMetaNamespaceTalk for bswiki (T231654) (duration: 00m 54s)
  • 11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016) (duration: 00m 52s)
  • 11:11 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016) (duration: 00m 53s)
  • 11:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WRITE_BOTH for items term store for wikidatawiki (T225055) (duration: 00m 55s)
  • 10:17 ema: cp1083: varnish-backend-restart -- mbox lag, fetch failures
  • 09:59 _joe_: removing old lvs-related scripts from ores*
  • 09:46 moritzm: moved uid=smalyshev from cn=wmf to cn=nda
  • 09:46 mutante: install1002 - import GPG key for getenvoy repo, importing envoy for jessie with reprepro update
  • 09:16 hashar: Deploy refactor of Zuul pipelines which might mean that some repos/branches would miss jobs or have extra unwanted jobs. In such case please fill in a task against #continuous-integration-config
  • 09:04 ema: cp1085: varnish-backend-restart, mbox lag and fetch failures
  • 09:03 gehel: reset kartotherian password -T231842
  • 08:54 ema: cp1089: varnish-backend-restart due to mbox lag and fetch failures
  • 08:49 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 08:49 ema: cp1075: pool ats-be with caching enabled T228629
  • 08:26 marostegui: Add REPLICATION grant to wikiuser and wikiadmin on db1073 with replication enabled - T229657
  • 08:21 gehel: purging maps / info.json from cache - T231842
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9031 and previous config saved to /var/cache/conftool/dbconfig/20190903-080958-marostegui.json
  • 08:04 joal@deploy1001: Finished deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try (duration: 00m 27s)
  • 08:03 joal@deploy1001: Started deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try
  • 08:02 joal@deploy1001: deploy aborted: Regular weekly analytics deploy train (duration: 27m 47s)
  • 07:16 marostegui: Change min_replicas to 6 on s1 for eqiad and codfw T231019
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9029 and previous config saved to /var/cache/conftool/dbconfig/20190903-063932-marostegui.json
  • 06:10 mutante: running puppet on cp-text_eqiad to switch people.wm.org to https backend
  • 06:04 marostegui: Change min_replicas to 4 on s7 for eqiad and codfw T231019
  • 05:53 mutante: people.wikimedia.org - switching to TLS termination with envoy
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Reorganize s7 codfw T230106', diff saved to https://phabricator.wikimedia.org/P9028 and previous config saved to /var/cache/conftool/dbconfig/20190903-055234-marostegui.json
  • 05:47 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s7 codfw T230106 (duration: 00m 54s)
  • 05:22 marostegui: Rename tables on the puppet database on m1 master - T231539
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2118 to s7 codfw master (db2047 -> db2118) T230106 (duration: 00m 54s)
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2047 old master from s7 T230106', diff saved to https://phabricator.wikimedia.org/P9027 and previous config saved to /var/cache/conftool/dbconfig/20190903-051619-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 codfw master (db2047 -> db2118) T230106', diff saved to https://phabricator.wikimedia.org/P9026 and previous config saved to /var/cache/conftool/dbconfig/20190903-051450-marostegui.json
  • 05:02 marostegui: Promote db2118 to s7 codfw master (db2047 -> db2118) T230106
  • 04:50 marostegui: Drop filejournal table on s3 - T51195
  • 04:49 vgutierrez: repooling cp2002 - T231433
  • 04:36 vgutierrez: upgrading ATS to 8.0.5-1wm4 on cp2002 - T231433
  • 04:28 vgutierrez: Switching cp2002 from nginx to ats-tls - T231433

2019-09-02

  • 22:08 ebernhardson: ban elastic1027 from production-search-chi
  • 20:48 ebernhardson: restart production-search-eqiad on elastic1027 again
  • 20:33 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@453ee8a]: Make osm-pbf source private (T231842) (duration: 02m 09s)
  • 20:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@453ee8a]: Make osm-pbf source private (T231842)
  • 19:54 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027
  • 17:57 mateusbs17: regenerating tiles from z0 to z9 in eqiad and codfw- T231691, T230511
  • 15:08 moritzm: installing libssh2 security updates
  • 14:36 moritzm: installing ghostscript updates on thumbor1001
  • 14:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 14:21 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 14:10 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 13:44 akosiaris: resync the sessionstore staging release as there was wrong port mapping (port 8080 instead of 8081) for both netpol and service
  • 13:43 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:09 vgutierrez: upgrading prometheus-trafficserver-exporter to version 0.3.2 on the cache cluster - T231533
  • 12:58 vgutierrez: upgrading prometheus-trafficserver-exporter to version 0.3.2 on cp5001 - T231533
  • 12:46 vgutierrez: uploaded prometheus-trafficserver-exporter 0.3.2 to apt.wikimedia.org (stretch) - T231533
  • 12:40 moritzm: installing freetype security updates on jessie (stretch/buster already fixed)
  • 11:23 moritzm: installing apache2 security updates on jessie
  • 11:18 moritzm: imported apache2 2.4.10-10+deb8u15+wmf1 to apt.wikimedia.org/jessie-wikimedia (rebuild of latest Jessie update against our patches)
  • 10:25 moritzm: installing libav security updates
  • 10:07 moritzm: installing subversion security updates on jessie
  • 09:21 marostegui: Drop filejournal table on s7 - T51195
  • 09:15 marostegui: Drop filejournal table on s1 - T51195
  • 08:45 marostegui: Drop filejournal table on s8 - T51195
  • 08:27 marostegui: Drop filejournal table on labtestwiki - T51195
  • 08:25 marostegui: Drop filejournal table on s2 - T51195
  • 08:15 godog: upgrade grafana to 5.4.5 on grafana1001
  • 08:12 godog: update amd-rocm debian repository gpg key (same id, new expiration)
  • 07:34 marostegui: Drop filejournal table on s4 - T51195
  • 07:26 marostegui: Drop filejournal table on s5 - T51195
  • 07:17 marostegui: Drop filejournal table on s6 - T51195
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2046 from config T231767 (duration: 00m 53s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2046 from config T231767 (duration: 00m 55s)

2019-09-01

  • 17:53 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=enwikiquote --verbose (T231137)
  • 17:45 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=metawiki --verbose (T231137)
  • 17:33 Urbanecm: Run foreachwikiindblist group1.dblist extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137)
  • 17:29 Urbanecm: Previous should be *group0.dblist (T231137)
  • 17:29 Urbanecm: Run foreachwikiindblist group0 extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137)


Archives

See Server admin log/Archives.