You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 23:50, 18 May 2020 by imported>Stashbot (pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0))
Jump to navigation Jump to search

2020-05-18

  • 23:50 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:12 ryankemper: Restarted `wdqs-updater` across all wdqs nodes and restarted `wdqs-categories` across all nodes except 1010 (test wdqs server) and 1009 (automated deployment server)
  • 22:55 Krinkle: Clear module_deps on dewiki (group2, old mw version, s5) to monitor regeneration
  • 22:48 Krinkle: Clear module_deps on group0 (mostly s3) to monitor regeneration
  • 22:35 Krinkle: Clear module_deps on commonswiki (group1, s4) to monitor regeneration
  • 22:33 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@4886dc3]: 0.3.32 (duration: 17m 12s)
  • 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:18 Krinkle: Clear module_deps on s2 wikis to monitor regeneration
  • 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:15 ryankemper@deploy1001: Started deploy [wdqs/wdqs@4886dc3]: 0.3.32
  • 22:02 Krinkle: Clear module_deps on hewiki (group1, s7) to monitor regeneration, ref T247028
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:23 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/resourceloader/dependencystore/: I015fa5885, I972a93806006 (duration: 01m 07s)
  • 21:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:27 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349 (duration: 03m 31s)
  • 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349
  • 19:07 herron: performing rolling maintenance on kafka-main to pick up java security updates
  • 19:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ic005093778d (duration: 01m 08s)
  • 18:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Ic005093778d (duration: 01m 06s)
  • 18:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:38 volans: upgraded spicerack to 0.0.37-1 on cumin[12]001
  • 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix English Wikipedia wordmark dimensions (T252143) (duration: 01m 06s)
  • 17:14 XioNoX: update domain object for 56.15.185.in-addr.arpa - T247972
  • 17:06 bblack: dns1001 - removing downtimes, back in service - T241770
  • 16:45 bstorm_: updated views on labsdb1011 for the wb_terms changes T251598
  • 16:32 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:17 bblack: dns1001 - reimaging for new NIC - T241770
  • 16:10 volans: uploaded spicerack_0.0.37-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 15:52 hnowlan: rolling codfw cassandra for java security updates
  • 15:51 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 15:11 Krinkle: krinkle@mc1021 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 14:57 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:56 hnowlan: roll-restart of sessionstore cassandra hosts for java security update
  • 14:55 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 14:35 hnowlan@deploy1001: Finished deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this (duration: 01m 22s)
  • 14:34 hnowlan@deploy1001: Started deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this
  • 14:33 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of esams T133821
  • 14:29 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqiad T133821
  • 14:23 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqsin, ulsfo T133821
  • 14:19 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of codfw T133821
  • 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2073 while replacing it T252985', diff saved to https://phabricator.wikimedia.org/P11216 and previous config saved to /var/cache/conftool/dbconfig/20200518-141505-kormat.json
  • 14:12 bblack: dns1001 - shutting down for T241770
  • 14:09 volans: uploaded spicerack_0.0.36-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 14:07 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching back to authdns1001 (oops, that's not the server we're taking offline today!)
  • 14:06 vgutierrez: upload trafficserver 8.0.7-1wm9 to apt.wm.o (buster)
  • 14:02 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 13:57 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching from authdns1001 to dns1002 for T241770
  • 13:29 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 13:00 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: VectorTemplate: SkinTemplateToolboxEnd hook isn't deprecated - T252906 (duration: 01m 07s)
  • 11:52 marostegui: Install 10.1.43-2 on db1122 and db1109 - T251981
  • 11:27 Lucas_WMDE: EU SWAT done
  • 11:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Wikibase/: SWAT: Fix core's TitleFactory not being used correctly (T252803) (duration: 01m 12s)
  • 11:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update GrowthExperiments mentor list page for viwiki (duration: 01m 06s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Make the threshold for Chinese WP to prevent publishing 5% more strict (T252786) (duration: 01m 06s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (597033) (duration: 01m 06s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (597033) (duration: 01m 32s)
  • 10:37 elukey: copy prometheus-druid-exporter 0.8-1 from stretch to buster wikimedia
  • 10:20 _joe_: upgrading purged in the remaining datacenters
  • 10:07 elukey: upload druid 0.12.3-1.1 to stretch|buster-wikimedia
  • 10:02 vgutierrez: upload trafficserver 8.0.7-1wm8 to apt.wm.o (buster)
  • 09:53 _joe_: upgrading purged in codfw, ulsfo
  • 09:46 mutante: contint2001 - apt-get remove --purge openjdk-11-* - T224591
  • 09:43 _joe_: upload purged 0.13 to buster-wikimedia
  • 08:44 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 08:13 godog: set weight to 0 for all but objects in ms-be10[678] - T252008
  • 07:57 mutante: replacing apache module with httpd module on deployment servers
  • 07:47 moritzm: installing apt security updates on jessie systems
  • 07:36 marostegui: Remove and add pc2007 from tendril as the Act is frozen after reimage - T250666
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088 after upgrade', diff saved to https://phabricator.wikimedia.org/P11214 and previous config saved to /var/cache/conftool/dbconfig/20200518-072234-marostegui.json
  • 07:20 marostegui: Upload MariaDB 10.4.13 to the buster repo - T250666
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:41 marostegui: Stop MySQL on db2088
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 for upgrade', diff saved to https://phabricator.wikimedia.org/P11213 and previous config saved to /var/cache/conftool/dbconfig/20200518-062452-marostegui.json
  • 05:55 _joe_: installing purged 0.12 on cp2027
  • 05:54 _joe_: uploaded purged 0.12 to apt.w.o
  • 05:00 marostegui: Stop MySQL on labsdb1011 to copy its content to backup1001 T249188

2020-05-16

  • 22:04 Krinkle: krinkle@mc1022 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 21:56 Krinkle: krinkle@mc1019 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 20:23 Krinkle: krinkle@mc1034,mc1035,mc1036 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 20:04 Krinkle: krinkle@mc1033 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:57 Krinkle: krinkle@mc1032 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:51 Krinkle: krinkle@mc1031 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:42 Krinkle: krinkle@mc1030 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:25 Krinkle: krinkle@mc1029 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:10 Krinkle: krinkle@mc1028 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 18:58 Krinkle: krinkle@mc1027 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:54 Krinkle: krinkle@mc1026 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:30 Krinkle: krinkle@mc1024 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:24 Krinkle: krinkle@mc1025 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 17:56 Krinkle: krinkle@mc1023 Pruning old echo:seen: Redis keys that didn't use a ttl yet, ref T252945
  • 17:49 Krinkle: krinkle@mwmaint1002: Running cleanupRemovedModules.php to prune old module_deps rows T113916
  • 17:24 Krinkle: krinkle@mc1020 Prune old echo:seen: keys that have ttl:-1 from Redis main stash, ref T252945
  • 15:16 Krinkle: krinkle@mc1020 Looking at why there are still over 2M echo:seen keys in redis main stash
  • 00:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: I046868190b472 (duration: 01m 13s)
  • 00:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:10 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:06 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer

2020-05-15

  • 23:50 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:46 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:46 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:46 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:35 ryankemper: Pooled wdqs2007 following successful query tests (all data transfers are done now)
  • 22:53 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I1b1578a57ef5 (duration: 01m 07s)
  • 22:51 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Iaa240eb8cf9 (duration: 01m 06s)
  • 21:41 ryankemper: depooled wdqs2007 while it catches up on lag
  • 21:40 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:33 ryankemper: pooled wdqs2003 and wdqs1007 following successful query tests
  • 19:46 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: If0fd1b51 (duration: 01m 08s)
  • 18:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:34 ryankemper: depooled wdqs2003 while lag catches up
  • 18:32 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:55 vgutierrez: upload acme-chief 0.25 to apt.wm.o (buster) - T252881
  • 17:27 XioNoX: renumber cr2-eqord:xe-0/1/1 to xe-0/1/3 - T221259
  • 17:02 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 17:01 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:00 ryankemper: depooled wqds1007 in preparation for impending wdqs data xfer
  • 16:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:02 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:56 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:52 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:49 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:45 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:40 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:36 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:32 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:31 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:27 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 14:19 cdanis: reverting sysctl net.ipv4.udp_mem to original on netflow3001
  • 14:18 cdanis: re-enable puppet on netflow*
  • 14:14 cdanis: disable puppet on netflow*
  • 14:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:47 ema: cp2029, cp3050: varnish-fe-restart to clear 'child restarted' alerts
  • 13:47 vgutierrez: downgrade ats to version 8.0.7-1wm7 on cp4032
  • 13:42 vgutierrez: upgrade ats to version 8.0.7-1wm8 on cp4032
  • 13:37 mutante: rsyncing gerrit git data from gerrit1001 to gerrit1002 (T200739)
  • 13:13 cdanis: increase samplicator recvbuf on netflow3001 & restart samplicator
  • 13:01 cdanis: increasing sysctl net.ipv4.udp_mem on netflow3001
  • 09:57 vgutierrez: upload trafficserver 8.0.7-1wm7 to apt.wm.o (buster)
  • 09:21 ema: cp2029: attempt forced discard of stuck VCL T236754
  • 09:09 elukey: restart druid brokers on druid100[4-6] - locked up due to datasources dropped - T226035
  • 08:51 ema: cp2029: try out varnish 5.1.3-1wm15 T236754
  • 07:36 XioNoX: bumps prefix limit for AS16735 in eqiad
  • 05:35 jynus: stop replication on pc2009, pc2010 for benchmarking T252761
  • 04:53 volker-e@deploy1001: Finished deploy [design/style-guide@dc956a3]: Deploy design/style-guide: (duration: 00m 10s)
  • 04:52 volker-e@deploy1001: Started deploy [design/style-guide@dc956a3]: Deploy design/style-guide:
  • 04:42 vgutierrez: repool cp5006
  • 04:28 vgutierrez: depool and reboot cp5006

2020-05-14

  • 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Revert temporary 20k logo for vecwiki (T252770) (duration: 01m 06s)
  • 23:23 RoanKattouw: Ran namespaceDupes.php for T252343
  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create Gapura (Portal) namespace on jvwiki (T252343) (duration: 01m 06s)
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.ub.uni-heidelberg.de and hq.eso.org to $wgCopyUploadDomains (T252600, T252726) (duration: 01m 07s)
  • 21:43 ryankemper: depooled wdqs2006 while lag recovers
  • 21:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:16 volans: moved codereview.tar.gz and with_r.tar.gz from miscweb1002 to cumin1001 to free space
  • 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: Allow plain text labels in side bar - T252727 (duration: 01m 06s)
  • 19:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:49 ryankemper: Depooled wqds1006 in preparation for impending wdqs data xfer
  • 18:36 Urbanecm: Morning SWAT done
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 15adbbc: [thwikisource] Set ProofReadPage separator to an empty string (T252610) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 4b8399c: Undeploy graphoid from mediawikiwiki (T242855) (duration: 01m 05s)
  • 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f03a45c: Adding import to test wikis from mediawikiwiki (T242855) (duration: 01m 07s)
  • 17:03 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 1 member 1 - T252797
  • 16:55 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 3 member 1 - T252797
  • 16:51 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 port 48 member 2 - T252797
  • 16:50 XioNoX: request virtual-chassis vc-port set pic-slot 1 port 2 member 1 - T252797
  • 16:42 XioNoX: request virtual-chassis vc-port delete pic-slot 1 port 2 member 1 - T252797
  • 16:36 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 port 48 member 2 - T252797
  • 15:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:25 XioNoX: disable asw2-d1-eqiad:et-1/1/0 - T251663
  • 14:39 mutante: kuai kuai is https://twitter.com/Arlieth/status/1257714333133357056 | https://en.wikipedia.org/wiki/Kuai_Kuai_culture
  • 13:31 _joe_: updating purged to 0.11 in eqiad,eqsin,esams
  • 12:47 vgutierrez: rolling upgrade ats to version 8.0.7-1wm7
  • 12:46 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 12:43 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 12:22 kormat: reverted iosched on pc1010 to `mq-deadline` T252761
  • 11:47 kormat: changed iosched on pc1010 to `none` as a test T252761
  • 11:07 matthiasmullie: EU swat done
  • 11:05 mlitn@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/WikibaseMediaInfo/: [MediaInfo] Enable media search for all users by default (duration: 01m 12s)
  • 11:04 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp3064
  • 10:31 fdans@deploy1001: Finished deploy [analytics/refinery@6f13979]: Regular analytics weekly train (duration: 17m 14s)
  • 10:14 fdans@deploy1001: Started deploy [analytics/refinery@6f13979]: Regular analytics weekly train
  • 09:58 elukey: remove matomo 3.11 from the main component of stretch-wikimedia
  • 09:56 elukey: upgrade matomo on matomo1001 to 3.13.3 (latest upstream) - T252741
  • 09:30 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 09:29 elukey: upload matomo-3.13.3 to thirdparty/matomo on stretch|buster-wikimedia
  • 09:22 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 08:57 elukey: imported gpg key 1FD752571FE36FF23F78F91B81E2E78B66FED89E in apt1001 (Matomo public debian repo)
  • 08:56 moritzm: installing Java security updates on Presto
  • 08:43 jayme: updated helm: 2.12.2-1 -> 2.16.7-1 on deploy[1,2]001 and contint1001. 2.12.2-4 -> 2.16.7-1 on contint2001
  • 08:39 jayme: imported helm 2.16.7-1 to main for jessie-wikimedia
  • 08:32 moritzm: installing Java security updates on Hadoop/AQS/Druid
  • 08:20 jayme@deploy2001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:00 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp5011
  • 07:03 moritzm: installing apt security updates
  • 06:33 ryankemper: Pooled wdqs2005 following successful test queries
  • 04:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:02 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:59 ryankemper: wdqs1005 has been de-pooled pending wdqs data xfer
  • 02:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 02:57 ryankemper: wdqs1004 was repooled after successful test queries
  • 02:55 ryankemper: wdqs2006 was repooled after successful test queries
  • 01:32 ryankemper: depooled wdqs2006 while waiting for lag to recover
  • 00:54 foks: change password for "Python eggs"
  • 00:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:31 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 twentyafterfour: phabricator update appears to be stable.
  • 00:05 twentyafterfour: updating phabricator. 1 patch + new translations. Expect only brief downtime.

2020-05-13

  • 23:46 cstone: SmashPig revision changed from cd1a49da5f to 2702b04329
  • 23:43 ejegg: updated payments-wiki from dabba1804c to 3c465cb11c
  • 23:36 ejegg: rolled back payments-wiki to dabba1804c
  • 23:29 ejegg: updated payment-wiki from dabba1804c to 3c465cb11c
  • 22:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:39 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper: Depooled wdqs1004 for subsequent wdqs data xfer
  • 22:29 ryankemper: Pooled wdqs2005 given that lag has returned to normal levels and the instance is responding to queries correctly
  • 22:26 ryankemper: Pooled wdqs1008 given that lag has returned to normal levels and the instance is responding to queries correctly
  • 21:30 elukey: powercycle analytics1055
  • 21:05 eileen: civicrm revision changed from cfb6101e39 to ed4c9522ac, config revision is 2eb75f8dff
  • 20:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T242430 Stop loading the ParsoidBatchAPI extension (duration: 01m 08s)
  • 19:09 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32 (duration: 01m 05s)
  • 19:08 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32
  • 18:54 twentyafterfour: restarted php-fpm on phab1001
  • 18:53 thcipriani: restarting gerrit
  • 18:52 twentyafterfour: restarting apache on phab1001 for lack of a better idea
  • 18:50 herron: restarted kafka broker on kafka-main1001 for java security updates
  • 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 38db3e0: Update production wordmarks (T252143) (duration: 01m 07s)
  • 18:17 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 38db3e0: Update production wordmarks (T252143) (duration: 01m 09s)
  • 17:55 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:24 ryankemper: Manually depooled wdqs2005 while lag catches up following the data xfer
  • 17:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:18 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:12 urandom: restarted cassandra-c, restbase2017
  • 17:04 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 16:11 James_F: Running AbuseFilter updateVarDumps on group0 on mwmaint1002 T246539
  • 16:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:32 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp4032
  • 15:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:30 jayme: imported scap 3.14.0-1 to main for buster-wikimedia
  • 15:30 jayme: imported scap 3.14.0-1 to main for jessie-wikimedia
  • 15:29 ryankemper: Manually de-pooling `wdqs1008.eqiad.wmnet` in preparation for wdqs data transfer
  • 15:29 jayme: imported scap 3.14.0-1 to main for stretch-wikimedia
  • 15:26 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:55 _joe_: upgrading + restarting purged across ulsfo and codfw T133821
  • 14:50 filippo@deploy1001: Finished deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 T251222 (duration: 00m 10s)
  • 14:50 filippo@deploy1001: Started deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 T251222
  • 14:35 vgutierrez: upload trafficserver 8.0.7-1wm6 to apt.wm.o (buster) - T249335 T251537
  • 13:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:55 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.deutsche-digitale-bibliothek.de to the wgCopyUploadsDomains (T252296) (duration: 01m 06s)
  • 11:17 Amir1: EU SWAT is done
  • 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable wgLegacyJavaScriptGlobals on fawiki and wikidatawiki (T72470) (duration: 01m 06s)
  • 11:09 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:06 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Anchor RegExp for Data Bridge in Beta (BETA-ONLY) (duration: 01m 06s)
  • 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
  • 10:55 volans: imported tqdm 4.11.2-1 packages into buster-wikimedia component/spicerack
  • 10:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:09 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master T252182 (duration: 01m 05s)
  • 09:55 jbond42: deployed a fix to ferm-status script. unmanaged ferm rules may get removed
  • 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 marostegui: Upgrade db2102 to the new 10.4.13 - T250666
  • 09:32 _joe_: installing purged 0.11 on cp2027 T133821
  • 09:21 _joe_: installing purged 0.11 on cp2028 T133821
  • 09:11 moritzm: re-enabling puppet
  • 09:08 mutante: rsyncing /home dirs from people.wikimedia.org to new backend people1002
  • 09:00 moritzm: disabling puppet temporarily
  • 08:53 _joe_: uploaded purged 0.11
  • 08:52 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 as pc1 master T252182 (duration: 01m 17s)
  • 07:42 jayme: imported helm 2.16.7-1 to main for stretch-wikimedia
  • 07:41 jayme: imported helm 2.16.7-1 to main for buster-wikimedia
  • 07:29 godog: roll-restart logstash in codfw/eqiad for configuration change
  • 07:14 elukey: upload spark2_2.4.4-bin-hadoop2.6-2 for buster/stretch on apt1001
  • 05:33 ryankemper: wdqs2004 was depooled ~3 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
  • 05:32 ryankemper: wdqs1003 was depooled ~6 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
  • 05:27 _joe_: restarting php-fpm on mw1374, children dying with SIGILL
  • 05:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 05:11 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
  • 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 05:10 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 04:52 kart_: Updated cxserver to 2020-05-11-082207-production (T250004)
  • 04:47 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:44 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:42 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 02:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:33 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer

2020-05-12

  • 23:09 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/includes/revisionlist/RevisionItemBase.php: Fix RevisionItemBase::getId to actually return an int, as intended - T252076 (duration: 01m 06s)
  • 19:55 dpifke@deploy1001: Finished deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - T238086 (duration: 00m 05s)
  • 19:55 dpifke@deploy1001: Started deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - T238086
  • 19:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.32
  • 18:41 legoktm: started codereview-archiver script in screen on mwmaint1002
  • 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:49 bblack: 'gdnsdctl replace' on all authdns to load new maxmind data
  • 17:43 bblack: updating maxmind database on puppetmasters (usually automated weekly; we're mid-cycle)
  • 17:10 James_F: Running AbuseFilter updateVarDumps on testwikis on mwmaint1002 T246539
  • 16:55 James_F: Running AbuseFilter updateVarDumps on closed wikis on mwmaint1002 T246539
  • 16:55 mstyles@deploy1001: Finished deploy [wdqs/wdqs@f617307]: v0.3.31 (duration: 14m 53s)
  • 16:40 mstyles@deploy1001: Started deploy [wdqs/wdqs@f617307]: v0.3.31
  • 16:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:34 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query
  • 15:15 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:14 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:13 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 moritzm: installing 4.9.118 Linux updates on Buster nodes (reboots happening later)
  • 15:02 moritzm: upgrading contint2001 to openjdk-8 u252
  • 15:01 godog: bounce pybal on lvs2010 and lvs2009 - T252186
  • 14:40 moritzm: imported openjdk-8 u252 forward port for buster-wikimedia component/jdk8
  • 14:40 ema: rolling thumbor upgrade to 2.8-1+deb10u1 T252509 T219569 T236240
  • 14:39 andrewbogott: rebuilding cloudcontrol1003 and 1004
  • 14:38 hashar: 1.35.0-wmf.22 is on test wikis. Will be pushed to group0 later today during the american window (19:00 - 21:00 UTC) # T249964
  • 14:34 ema: thumbor2001: repool
  • 14:33 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - Test everywhere, SearchSatisfaction on testwiki only - T249261 (duration: 01m 06s)
  • 14:33 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.8-1+deb10u1 T252509 T219569 T236240
  • 14:23 moritzm: installing Java security updates on WDQS hosts
  • 14:20 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.32 (duration: 72m 04s)
  • 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 ema: thumbor2001: depool due to minor bug in 2.7-1+deb10u1 T252509 T219569 T236240
  • 13:54 ema: thumbor2001: pool thumbor 2.7-1+deb10u1 for prod traffic T252509 T219569 T236240
  • 13:50 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.7-1+deb10u1 T252509 T219569 T236240
  • 13:42 jbond42: disable puppet on all CP hosts to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/583342
  • 13:36 kormat: reimaging pc2007 to buster T252182
  • 13:36 moritzm: rebooting netflow* hosts for kernel update
  • 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:33 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm5 - T249335
  • 13:31 moritzm: rebooting deneb for kernel update
  • 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:08 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.32
  • 13:05 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.28 (duration: 23m 47s)
  • 12:37 moritzm: installing iputils update from Buster point release
  • 12:08 hashar: Cutting branch 1.35.0-wmf.32 # T249964
  • 12:08 gehel: restart blazegraph + updater on wdqs2002 - JVM upgrade
  • 11:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 11:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp5011 - T249335
  • 10:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:43 kormat: reimaging pc2010 to buster T252182
  • 10:30 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp4032 - T249335
  • 10:30 ema: rolling thumbor upgrade to 2.6-1+deb10u1 T226707
  • 10:19 ema: repool thumbor2001 with upgraded python-thumbor-wikimedia
  • 10:13 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.6-1+deb10u1
  • 10:04 godog: update compiler facts
  • 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:34 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:29 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: cluster=thanos
  • 09:07 moritzm: rebooting contint2001 for kernel update
  • 09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:46 godog: reboot thanos hosts for kernel upgrade
  • 07:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:41 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:12 moritzm: rebooting the IDP hosts, SSO sessions will need to be renewed
  • 07:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:56 vgutierrez: upload trafficserver 8.0.7-1wm4 to apt.wm.o (buster) - T242767 T249335
  • 05:29 marostegui: Restart docker-report-releng on deneb
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only=off for maintenance T251502', diff saved to https://phabricator.wikimedia.org/P11180 and previous config saved to /var/cache/conftool/dbconfig/20200512-050339-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T251502', diff saved to https://phabricator.wikimedia.org/P11179 and previous config saved to /var/cache/conftool/dbconfig/20200512-050054-marostegui.json
  • 04:46 marostegui: Stop mysql on labsdb1011 to transfer its content - T249188
  • 02:14 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:45 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-05-11

  • 21:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 Zoranzoki21: T235414 is wrong task number, T235415 is correct
  • 19:02 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.bollywoodhungama.in and *.britishmuseum.org to $wgCopyUploadDomains (T235414, T251882) (duration: 00m 57s)
  • 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove "Create a book" link on enwiki (T241683) (duration: 00m 57s)
  • 18:44 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable modern Vector on officewiki, reveal preference on testwiki (T251285) (duration: 00m 58s)
  • 18:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add tw-photometa.de to $wgCopyUploadsDomains (T252141) (duration: 00m 58s)
  • 18:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:28 catrope@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop mainpage special casing for scowiki and itwiki (T252048, T252065) (duration: 00m 58s)
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/includes/Revision/RevisionStore.php: T252156 T212428 RevisionStore: fall back to master db if main slot is missing (duration: 00m 58s)
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/AbuseFilter/maintenance/updateVarDumps.php: updateVarDumps: wait for replication after each batch (duration: 00m 58s)
  • 17:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/skins/Vector/includes/VectorTemplate.php: T251521 Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
  • 17:24 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/skins/Vector/includes/VectorTemplate.php: T251521 Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
  • 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 16:47 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 04m 43s)
  • 16:42 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 16:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
  • 16:40 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 16:34 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
  • 16:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: mediawikiwiki to 1.35.0-wmf.31 (T249963) for testing T252179
  • 16:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:06 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaMaintenance: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 06s)
  • 16:05 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/UploadWizard: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 06s)
  • 16:04 hnowlan@deploy1001: Finished deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic (duration: 01m 58s)
  • 16:04 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Babel: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 07s)
  • 16:03 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Translate: Revert "Remove uses of WikiPage::doEditContent" (duration: 01m 08s)
  • 16:02 hnowlan@deploy1001: Started deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:52 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:49 cdanis@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=eventgate-analytics.*
  • 15:45 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:42 brennen: syncing backports to 1.35.0-wmf.31 (T249963) for T252179
  • 15:42 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:01 moritzm: installing puma security updates
  • 14:29 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:44 vgutierrez: upgrade ATS to 8.0.7-1wm4 in cp4032 - T249335
  • 13:36 hashar: Rolling back CI system switch to previous known state # T224591
  • 13:20 marostegui: Upgrade mysql package on s4 master in preparation for tomorrow's maintenance T251502
  • 12:50 hashar: Pointing CI Jenkins to contint2001 Gearman server T224591
  • 12:46 mutante: contint2001 - chown -R jenkins-slave:jenkins-slave /srv/.git
  • 12:45 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv/.git/
  • 12:43 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv-/org/.git/
  • 12:40 mutante: contint1001 - rsync -avz --delete /srv/org/wikimedia/integration/ rsync://contint2001.wikimedia.org/ci--srv-/org/wikimedia/integration/
  • 12:24 mutante: contint2001 - find /var/lib/jenkins/ -group bacula -exec chown jenkins:jenkins {} \;
  • 12:21 mutante: contint2001 - find /var/lib/jenkins/ -user statsite -exec chown jenkins {} \;
  • 12:19 mutante: contint2001 - chown -R jenkins:jenkins /srv/jenkins/*
  • 12:19 mutante: contint1001 - rsync -avz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/
  • 12:17 mutante: contint1001 - rsync -avz --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
  • 12:14 hashar: shutting down Zuul and Jenkins for system switch # T224591
  • 12:02 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:59 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:32 Lucas_WMDE: EU SWAT done
  • 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/WikimediaEvents/: SWAT: Update Banner Interaction Schema (T250791, wmf.30) (duration: 01m 08s)
  • 11:23 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaEvents/: SWAT: Update Banner Interaction Schema (T250791, wmf.31) (duration: 01m 07s)
  • 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 595478|Revert limit adjustment for Chinese translation with ContentTranslation (T252371) (duration: 01m 09s)
  • 10:58 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (595498) (duration: 01m 06s)
  • 10:56 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (595498) (duration: 01m 07s)
  • 10:15 vgutierrez: upload trafficserver 8.0.7-1wm3 to apt.wm.o (buster) - T242767 T249335
  • 09:44 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown jenkins:jenkins {} \;
  • 09:31 hashar: contint2001 started zuul-merger again (had permission issues in /var/lib/zuul )
  • 09:07 mutante: contint1001 - rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/ (T224591)
  • 09:05 mutante: contint2001 - mkdir /srv/jenkins
  • 08:55 hashar: contint2001 stopping zuul-merger , permission problem
  • 08:46 godog: bounce ferm on kubernetes1007 to resolve icinga UNKNOWN
  • 08:40 mutante: rsyncing /var/lib/jenkins from contint1001 to contint2001 with --delete
  • 08:32 mutante: rsynced data from contint1001 to contint2001 - pathes per T224591#6039192 for the migration later today
  • 08:30 ema: cp3050: upgrade atskafka to 0.6 T237993
  • 08:30 _joe_: removing the iptables DROP rule on mc1020 T251378
  • 07:54 moritzm: installing squid security updates
  • 07:21 moritzm: updated buster netboot images to 10.4 (updated to latest point release)
  • 07:09 _joe_: dropping requests to mc1020 via a firewall rule T251378
  • 06:04 elukey: restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system

2020-05-10

  • 12:18 marostegui: Start event scheduler on db1115 after a massive delete - T252324
  • 11:05 marostegui: Stop event scheduler on db1115 to perform a massive delete - T252324
  • 10:27 dcausse: restarting blazgraph on wdqs1004: T242453
  • 09:56 marostegui: Change scaling_governor from powersave to performance on db1115 - T252324
  • 09:25 marostegui: Stop MySQL and restart db1115 - T252324
  • 08:50 marostegui: Restart mysql on db1115 to change buffer pool size from 20GB to 40GB T252324 (
  • 08:44 elukey: Power cycle analytics1052 after eno1 issue
  • 08:01 marostegui: Disable unused events like %_schema T252324 T231185
  • 07:11 marostegui: Restart mysql on db1115 T231185
  • 07:11 marostegui: Truncate tendril. processlist_query_log T231185

2020-05-08

  • 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598
  • 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598
  • 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598
  • 21:06 ottomata: running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203
  • 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121
  • 17:59 marostegui: Extend /srv by 500G on labsdb1011 T249188
  • 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203
  • 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203
  • 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
  • 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
  • 14:05 akosiaris: T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
  • 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335
  • 13:20 akosiaris: T243106 redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
  • 13:16 akosiaris: T243106 undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
  • 12:49 akosiaris: T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
  • 12:49 akosiaris: T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
  • 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
  • 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
  • 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
  • 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
  • 08:20 vgutierrez: rolling restart of ats-tls on esams - T249335
  • 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - T249335
  • 07:07 mutante: phabricator rmdir /var/run/phd/pid - empty and now unused
  • 07:01 moritzm: installing php5 security updates
  • 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:10 marostegui: Upgrade pc1010
  • 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for T252179
  • 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for T252179

2020-05-07

  • 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: Handle RevisionAccessException with try-catch (T252156) (duration: 01m 08s)
  • 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - T238230 (duration: 01m 07s)
  • 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
  • 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
  • 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - T252010
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types T252142 (duration: 01m 17s)
  • 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types T252142
  • 18:15 Urbanecm: Morning SWAT done
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 899c175: Update project icons to refreshed SVGs (T249047; part 2/2) (duration: 01m 06s)
  • 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 899c175: Update project icons to refreshed SVGs (T249047; part 1/2) (duration: 01m 08s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 54bd2f1: Add the investigate right to the checkuser group on testwiki (T251932) (duration: 01m 08s)
  • 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
  • 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
  • 17:18 ejegg: updated payments-wiki from afb84cc391 to dabba1804c
  • 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
  • 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
  • 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
  • 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
  • 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
  • 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
  • 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: T251460 Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
  • 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:27 vgutierrez: rolling restart of ats-tls on text@esams - T249335
  • 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
  • 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
  • 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
  • 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert T172489
  • 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
  • 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
  • 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
  • 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
  • 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: gerrit:594920 T252079 Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
  • 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
  • 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
  • 11:10 matthiasmullie: EU swat done
  • 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
  • 10:07 moritzm: installing Java security updates on restbase/sessionstore
  • 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
  • 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
  • 08:06 jynus: setting pc2007, pc2009 as read-write
  • 07:44 godog: further decrease weight for ms-be10[678] - T252008
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
  • 05:22 marostegui: Reimage db2078
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
  • 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for T252079
  • 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for T252079
  • 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2020-05-06

  • 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
  • 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias (T245791) (duration: 01m 07s)
  • 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: RevisionItem: Fix providing timestamp in getRevisionLink (duration: 01m 09s)
  • 21:45 andrewbogott: updating puppet compiler facts
  • 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:35 ejegg: updated Fundraising CiviCRM from b15b2cfbb5 to cfb6101e39
  • 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
  • 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
  • 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked (T249963), rolling forward to group1
  • 19:03 brennen: 1.35.0-wmf.31 train unblocked (T249963), rolling forward to group0
  • 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN T252052 (duration: 01m 09s)
  • 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
  • 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
  • 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
  • 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes T252043 (duration: 01m 08s)
  • 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN (T243056)
  • 13:32 hashar: Restarting CI Jenkins
  • 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
  • 13:27 moritzm: installing graphicsmagick security updates
  • 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - T252010
  • 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - T252010
  • 13:19 ema: cp: upgrade purged to v0.10
  • 13:08 godog: start swift decom ms-be101[678] - T252008
  • 11:22 kart_: EU SWAT done.
  • 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 594668|Enable ContentTranslation in Armenian WP as a default tool (T249229) (duration: 01m 08s)
  • 10:27 ema: cp2027: test purged v0.10
  • 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
  • 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
  • 09:52 jbond42: enable rember me feature of CAS
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
  • 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - T251158
  • 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
  • 08:53 jynus: kill FTWRL on db2101
  • 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 T99740 (duration: 01m 16s)
  • 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic (T99740)
  • 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 T250666', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
  • 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
  • 06:00 elukey: powercycle analytics1060 - host stuck - T251973
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out T250055', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
  • 05:02 marostegui: Deploy schema change on db1121

2020-05-05

  • 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki (T249643) (duration: 01m 06s)
  • 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
  • 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
  • 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
  • 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
  • 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
  • 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
  • 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: T251952 take 2 (duration: 01m 06s)
  • 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: T251952 (duration: 01m 05s)
  • 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: T251950 (duration: 01m 06s)
  • 20:02 herron: added ryankemper to wmf and ops ldap groups T251572
  • 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
  • 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
  • 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
  • 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% (T249963, T223287)
  • 18:39 cdanis: depool mw2221 for some manual testing
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
  • 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
  • 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
  • 16:48 brennen: 1.35.0-wmf.31 was branched at 4d3fed3 for T249963
  • 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 (T249963) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
  • 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 (T249963) at 16:30 UTC
  • 15:47 cstone: SmashPig revision changed from 8c30ed7fe5 to cd1a49da5f
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
  • 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
  • 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
  • 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. T219921
  • 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # T97513
  • 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
  • 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
  • 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
  • 12:37 XioNoX: push pfw policy - T251769
  • 12:07 jbond42: updating cas login page
  • 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
  • 11:47 moritzm: rolling restart of apache on kibana hosts
  • 11:41 mutante: LDAP - added eamedia to wmf group (T251358)
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 T248086', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T248086', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
  • 11:30 marostegui: Drop T248086_wb_terms table on labsdb hosts - T248086
  • 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
  • 11:22 kart_: EU SWAT done.
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 592479|Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383) (duration: 01m 01s)
  • 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
  • 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime T250666', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 T248086', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T248086', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
  • 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 (T251660)
  • 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 (T251575)
  • 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
  • 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:36 moritzm: removing boron.eqiad.wmnet
  • 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 09:03 gehel: restarting wdqs updater on all servers
  • 08:53 moritzm: installing Java security updates on releases*
  • 08:44 kormat: reimaging es1024 to buster T250666
  • 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy T144187 T249809
  • 08:26 moritzm: upgrading slapd on serpens/seaborgium
  • 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy T144187 T249809
  • 08:08 moritzm: installing Java security updates on notebook/stat hosts
  • 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
  • 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
  • 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
  • 06:59 addshore: depool wdqs1006 heavy lag
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance T251154', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance T251154', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
  • 05:19 marostegui: Start s5 and s6 maintenance - T251154
  • 04:39 marostegui: Restart mysql on tendril host: db1115 - T231769

2020-05-04

  • 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
  • 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
  • 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
  • 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
  • 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
  • 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
  • 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: T251835: Restore dc752af (duration: 00m 57s)
  • 22:16 eileen: process-control config revision is 2eb75f8dff
  • 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for T250887 (duration: 00m 57s)
  • 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for T250887 (duration: 00m 57s)
  • 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for T250887 (duration: 00m 57s)
  • 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086 (duration: 00m 05s)
  • 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086
  • 18:16 Urbanecm: Morning SWAT done
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c04fbdd: Adding upload_by_url user right to all registered users on Commons (T251474) (duration: 00m 57s)
  • 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: b85fc16: Enable on all ExtraSignaturesNamespaces (T249036) (duration: 01m 00s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 18c1efb: Load DiscussionTools on en.wiki (T249376) (duration: 00m 58s)
  • 17:57 XioNoX: configure singtel interface on cr1-eqsin
  • 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
  • 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a] (duration: 00m 09s)
  • 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a]
  • 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a] (duration: 16m 45s)
  • 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a]
  • 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
  • 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
  • 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
  • 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
  • 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
  • 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: T251457 rdbms: don't treat lock() as a write operation (duration: 01m 04s)
  • 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: T250393 Follow-up I07dd6f7: Fix font size in diff (duration: 01m 05s)
  • 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
  • 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
  • 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
  • 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279] (duration: 00m 10s)
  • 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279]
  • 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279] (duration: 15m 07s)
  • 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 ppchelko@deploy1001: Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints
  • 14:50 joal@deploy1001: Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279]
  • 14:19 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json
  • 14:15 XioNoX: add static nat for fran1001 - T251763
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2025 for reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json
  • 13:34 kormat: reimaging es2025 to buster T250666
  • 13:27 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json
  • 13:02 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T248664 Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s)
  • 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json
  • 12:10 marostegui: Temporary enable slow query log on db1099:3311 - T206103
  • 12:09 Amir1: EU SWAT is done
  • 11:53 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase wmgMemoryLimit from 660MB to 666MB (duration: 01m 06s)
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 T206103 after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json
  • 11:46 tgr@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: Help panel: Check if guidance feature flag is set before loading mobile peek (T251589) (duration: 01m 06s)
  • 11:46 marostegui: Remove index tmp_2 from recentchanges on db1099:3311 T206103
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T206103 to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json
  • 11:43 tgr@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: Help panel: Check if guidance feature flag is set before loading mobile peek (T251589) (duration: 01m 10s)
  • 11:38 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
  • 11:30 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 4d00236: Enable cross-project search on frwikibooks (T251683) (duration: 01m 05s)
  • 11:25 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png (T251050)
  • 11:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 64556ba: Correct typo in Greek Wikiversity logo (T248391) (duration: 01m 06s)
  • 11:20 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png (T251050)
  • 11:20 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 3b8c618: Update jvwiki logos (T251050) (duration: 01m 05s)
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cc94ea7: Enable VisualEditor for more namespaces on vecwiki (T250419) (duration: 01m 07s)
  • 10:49 arturo: update packages in buster-wikimedia | thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 (T250866)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 01m 05s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 01m 29s)
  • 10:39 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm3
  • 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:30 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia|thirdparty/kubeadm-k8s (T250866)
  • 09:46 vgutierrez: upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster)
  • 09:22 kormat: reimaging db1101 to buster T250666
  • 08:50 XioNoX: configure BGP peering with AS132203
  • 08:20 godog: add 50G to prometheus-ops on prometheus100[34]
  • 08:17 marostegui: Deploy schema change on s5 codfw - T251188
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json
  • 07:31 marostegui: Drop unused flagged* tables from mediawikiwiki - T248298
  • 07:26 moritzm: removed jmorgan from cn=wmf
  • 07:24 marostegui: Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - T251154
  • 07:24 moritzm: removed Kerberos principal for lexnasser and jmorgan
  • 07:23 moritzm: removed lexnasser from cn=nda
  • 07:07 elukey: execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping
  • 06:41 elukey: upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia

2020-05-03

  • 22:52 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/593929
  • 22:42 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/591459
  • 21:37 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s)
  • 21:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service

2020-05-02

  • 07:49 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(49|5[0-9]|6[0-2])\.eqiad\.wmnet
  • 07:08 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
  • 02:36 volker-e@deploy1001: Finished deploy [design/style-guide@f0d467b]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:36 volker-e@deploy1001: Started deploy [design/style-guide@f0d467b]: Deploy design/style-guide:

2020-05-01

  • 19:56 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw13(5[6-9]|6[0-2]).eqiad.wmnet
  • 18:57 gehel: restart blazegraph on wdqs1006 - T242453
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json
  • 14:18 hknust: holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of T219279
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly warm up db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json
  • 13:06 hknust: holger@mwmaint1002 Starting renameInvalidUsernames.php as part of T219279
  • 13:01 vgutierrez: rolling restart of ats-tls in text@esams - T249335
  • 12:24 mutante: mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw
  • 12:20 mutante: mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw
  • 12:07 mutante: notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet T251560
  • 12:05 mutante: notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet T251560
  • 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 _joe_: depooled all servers in the app pool in rack D1
  • 08:54 oblivian@cumin1001: conftool action : set/pooled=no:weight=30; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet
  • 08:50 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet
  • 08:48 _joe_: repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled
  • 08:45 _joe_: repooling mw1409
  • 08:39 _joe_: repool mw1352
  • 08:37 _joe_: depooling mw1352
  • 07:44 marostegui: Copy wikireplica dump from labsdb1009 to labsdb1011 - T249188
  • 01:36 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s)
  • 01:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service

Archives

See Server admin log/Archives.