You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 00:37, 9 June 2021 by imported>Stashbot (catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikisource OCR on select Wikisources (T283898) (duration: 01m 31s))
Jump to navigation Jump to search

2021-06-09

  • 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikisource OCR on select Wikisources (T283898) (duration: 01m 31s)
  • 00:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-06-08

  • 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
  • 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
  • 22:21 ryankemper: T284479 Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
  • 22:15 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
  • 22:14 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
  • 22:10 ryankemper: T284479 Yup more than enough evidence of a strong upward spike now. Proceeding to revert
  • 22:10 ryankemper: T284479 Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
  • 22:09 ryankemper: T284479 Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
  • 22:03 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
  • 22:01 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
  • 21:59 ryankemper: T284479 Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
  • 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:29 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 21:27 ryankemper: T280382 Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
  • 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
  • 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
  • 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
  • 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9 refs T281150
  • 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
  • 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:36 ryankemper: T280382 Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
  • 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: Fix MonoBook orange banner hover styles (T284496) (duration: 01m 08s)
  • 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
  • 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
  • 19:20 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:18 ryankemper: T280382 `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
  • 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
  • 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
  • 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
  • 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr{1,2}-eqiad
  • 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9 refs T281150 (duration: 34m 12s)
  • 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9 refs T281150
  • 16:27 papaul: powerdown moss-fe2002 for relocation
  • 16:06 papaul: powerdown ms-backup2002 for relocation
  • 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:40 papaul: powerdown ms-be2061 for relocation
  • 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 15:33 papaul: powerdown thanos-fe2003 for relocation
  • 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, T280605, T282761.
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:13 papaul: powerdown cp2034 for relocation
  • 15:04 papaul: powerdown cp2033 for relocation
  • 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light T164456
  • 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters T284106
  • 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master T282761 (duration: 00m 57s)
  • 14:05 kormat: setting pc1010 as pc3 primary T282761
  • 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 42s)
  • 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 47s)
  • 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 03s)
  • 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master T282761 (duration: 00m 57s)
  • 12:14 kormat: setting pc1008 back as pc2 primary T282761
  • 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef49422: enwiki: Disable indexing on the Book namespace (T283522) (duration: 00m 56s)
  • 11:46 urbanecm: Start server-side upload for 1 file (T283470)
  • 11:45 moritzm: installing nginx security updates on buster
  • 11:43 urbanecm: Start server-side upload for 2 files (T283645, T283583)
  • 11:39 urbanecm: EU B&C deployment done
  • 11:38 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16329 and previous config saved to /var/cache/conftool/dbconfig/20210608-113857-kormat.json
  • 11:38 moritzm: installing ruby-nokogiri security updates
  • 11:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/WikimediaEvents/: b0b4653: universalLanguageSelector: Add missing properties (T280770) (duration: 00m 56s)
  • 11:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: 5df13ee: Pass context to compact_language_links.open hook (T280770) (duration: 00m 57s)
  • 11:23 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16328 and previous config saved to /var/cache/conftool/dbconfig/20210608-112354-kormat.json
  • 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 3/3) (duration: 00m 58s)
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/config/lvwiki.yaml: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 2/3) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 1/3) (duration: 00m 57s)
  • 11:10 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=lvwiki growthexperiments # T278191
  • 11:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16327 and previous config saved to /var/cache/conftool/dbconfig/20210608-110850-kormat.json
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: abd4010: enwiki: Deploy Growth freatures to 2% of new accounts (T281896) (duration: 00m 57s)
  • 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
  • 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
  • 10:53 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json
  • 10:50 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)
  • 10:49 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
  • 10:16 liw: testing upcoming Scap release on beta
  • 10:01 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki2001 - T282469
  • 09:58 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)
  • 09:57 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
  • 09:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:04 jayme: removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - T251918
  • 08:31 dcausse: depooling wdqs1006 (lag)
  • 08:29 dcausse: restarting blazegraph on wdqs1006
  • 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:13 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 07:41 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 07:40 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json
  • 06:52 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json
  • 06:27 elukey: clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 04:54 marostegui: Repool clouddb1019:3314
  • 04:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:38 ryankemper: T284445 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`
  • 02:37 ryankemper: T284445 after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)
  • 02:34 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)

2021-06-07

  • 21:26 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 21:12 sbassett: Deployed security patch for T284364
  • 19:30 ryankemper: T284479 [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected
  • 19:25 ryankemper: T284479 [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests
  • 19:21 ryankemper: T284479 [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`
  • 19:19 cdanis: T284479 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"
  • 19:07 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462 (duration: 04m 53s)
  • 19:02 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462
  • 19:01 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)
  • 18:59 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve
  • 18:57 herron: prometheus3001: moved /srv back to vda1 filesystem T243057
  • 18:26 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=T284149
  • 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 2/2) (duration: 00m 57s)
  • 18:22 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 1/2) (duration: 00m 56s)
  • 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/initWikiConfig.php: 7089728: b2482fb: initWikiConfig GE backports (T284072) (duration: 00m 58s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 3/3) (duration: 00m 56s)
  • 18:14 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 2/3) (duration: 00m 56s)
  • 18:14 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 18:14 ottomata: rolling restart of kafka jumbo brokers - T283067
  • 18:13 urbanecm@deploy1002: Synchronized wmf-config/config/skwiki.yaml: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 1/3) (duration: 00m 59s)
  • 18:12 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 18:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=skwiki growthexperiments # T284149
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5de2f8b: Set WelcomeSurveyEnableWithHomepage (T281896, T284257) (duration: 00m 59s)
  • 17:53 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 17:53 ottomata: rolling restart of kafka jumbo mirror makers - T283067
  • 17:17 ryankemper: [Cirrussearch] We're seeing ~10% of current requests being rejected by poolcounter, due to ~2x expected `eqiad.full_text` query volume and ~30x expected `eqiad.entity_full_text` query volume
  • 16:56 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph locked up)
  • 16:51 razzi: run homer '*.eqiad.wmnet' diff
  • 16:49 ottomata: restarting mysqld analytics-meta replica on db1108 to apply config change - T272973
  • 16:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6 (duration: 04m 29s)
  • 16:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6
  • 16:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6 (duration: 00m 35s)
  • 16:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6
  • 14:57 moritzm: installing remaining lz4 security updates on buster
  • 14:35 moritzm: installing isc-dhcp security updates
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P16315 and previous config saved to /var/cache/conftool/dbconfig/20210607-141722-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P16314 and previous config saved to /var/cache/conftool/dbconfig/20210607-141307-marostegui.json
  • 13:35 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3) (duration: 00m 52s)
  • 13:34 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3)
  • 13:34 moritzm: installing libxml2 security updates on stretch
  • 13:32 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 14s)
  • 13:31 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:28 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 54s)
  • 13:27 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 12:41 moritzm: removing now obsolete Java 8 packages from gerrit* T268225
  • 12:36 moritzm: removing now obsolete Java 8 packages from contint* T268225
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 moritzm: installing nginx security updates on buster
  • 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --add-prefix=BROKEN --fix # T284442
  • 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki # T284442
  • 11:09 Lucas_WMDE: EU backport+config window done
  • 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add 2021 namespaces for wikimania wiki (T284235) (duration: 00m 56s)
  • 10:48 volans: reset netbox-next DB with the latest prod dump
  • 10:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:41 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 10:38 godog: downgrade grafana to 7.4.2 on grafana2001 - T282863
  • 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
  • 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 10:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 10:28 kormat: reimaging db1157 T283131
  • 10:24 moritzm: remove now obsolete nginx mods and dependencies on htmldumper1001 T164456
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 depooling: reimage to buster T283131', diff saved to https://phabricator.wikimedia.org/P16311 and previous config saved to /var/cache/conftool/dbconfig/20210607-100822-kormat.json
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:43 moritzm: upgrading bullseye hosts to latest packages in testing
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 09:03 moritzm: installing imagemagick security updates on stretch
  • 06:05 marostegui: Upgrade mysql on dbstore1003 T283235
  • 05:57 marostegui: Stop dbstore1004 to clone dbstore1007 T283125
  • 05:37 marostegui: Depool clouddb1020 (s5, s8) for upgrade
  • 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
  • 04:48 marostegui: Depool clouddb1019:3314 (long running alter table)

2021-06-05

  • 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now (T282303)
  • 15:21 Amir1: delete mbox files of group D and E in mm2 (T282303)
  • 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
  • 00:18 mutante: backup1001 systemctl reload bacula-dir fails

2021-06-04

  • 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
  • 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
  • 20:59 bblack: repool cp1087 - T278729
  • 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 19:06 bblack: depool cp1087 - T278729
  • 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
  • 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I434d9c (duration: 00m 56s)
  • 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: Iea41ab (duration: 00m 56s)
  • 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: Iea41ab (duration: 00m 59s)
  • 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref T282761
  • 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:46 marostegui: Upgrade mysql on clouddb1016 T283235
  • 12:27 marostegui: Upgrade mysql on clouddb1015 T283235
  • 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
  • 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
  • 09:47 ema: pool cp1087 T278729
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
  • 09:06 ema: reboot cp1087 T278729
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16302 and previous config saved to /var/cache/conftool/dbconfig/20210604-084735-root.json
  • 08:33 marostegui: Upgrade db1110 T283235
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16301 and previous config saved to /var/cache/conftool/dbconfig/20210604-083232-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16300 and previous config saved to /var/cache/conftool/dbconfig/20210604-082956-marostegui.json
  • 08:20 godog: upgrade karma to 0.86-1
  • 07:38 jynus: stop and upgrade db1150 T283235
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16299 and previous config saved to /var/cache/conftool/dbconfig/20210604-073326-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16298 and previous config saved to /var/cache/conftool/dbconfig/20210604-073318-root.json
  • 07:29 moritzm: cleanup now unused nginx mods and former deps on install* and puppetdb* servers after switch towards nginx-light (various X11 libs and libxslt) T164456
  • 07:24 moritzm: cleanup now unused nginx mods and former deps on install* servers after switch towards nginx-light (various X11 libs and libxslt)
  • 07:19 urbanecm: Password reset for SUL User:Dominic_Mayers (T282656)
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16297 and previous config saved to /var/cache/conftool/dbconfig/20210604-071823-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16296 and previous config saved to /var/cache/conftool/dbconfig/20210604-071815-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16295 and previous config saved to /var/cache/conftool/dbconfig/20210604-070319-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16294 and previous config saved to /var/cache/conftool/dbconfig/20210604-070311-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16293 and previous config saved to /var/cache/conftool/dbconfig/20210604-064815-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16292 and previous config saved to /var/cache/conftool/dbconfig/20210604-064807-root.json
  • 06:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:42 marostegui: Upgrade mysql on db1096:3315 db1096:3316
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 db1096:3315', diff saved to https://phabricator.wikimedia.org/P16291 and previous config saved to /var/cache/conftool/dbconfig/20210604-064242-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16290 and previous config saved to /var/cache/conftool/dbconfig/20210604-055521-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16289 and previous config saved to /var/cache/conftool/dbconfig/20210604-054017-root.json
  • 05:26 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16288 and previous config saved to /var/cache/conftool/dbconfig/20210604-052514-root.json
  • 05:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:17 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:16 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json
  • 04:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:25 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:22 ryankemper: T280382 `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:33 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph`
  • 02:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:30 ryankemper: T280382 `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:25 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag)
  • 02:09 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 02:06 ebernhardson: post-deploy restart airflow-(webserver|scheduer) on an-airflow1001
  • 02:05 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s)
  • 02:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift
  • 01:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 57s)
  • 00:07 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 00:06 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-03

  • 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s)
  • 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s)
  • 23:33 mutante: installing OS on fresh VM doh5001
  • 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Restrict changetags to sysops and bots on meta T283625 (duration: 00m 58s)
  • 22:41 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 22:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper: T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
  • 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:35 ryankemper: T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
  • 20:47 sbassett: Deployed security patch for T282932
  • 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
  • 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
  • 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service
  • 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
  • 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
  • 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 - icinga-wm> RECOVERY - Check systemd state on deneb is OK
  • 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
  • 19:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
  • 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456
  • 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
  • 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
  • 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant (T164456)
  • 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
  • 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
  • 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
  • 17:16 urandom: remove dropped Cassandra keyspace snapshots -- T258414
  • 16:55 ejegg: updated payments-wiki from 6fac77f60e to 7be0534b91
  • 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
  • 15:27 papaul: pdu replacement complete
  • 15:25 moritzm: upgrading gitlab to 13.11.5
  • 15:08 papaul: disconnect ps2-d8-codfw for replacement
  • 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
  • 14:23 moritzm: installing nginx security updates on buster
  • 14:12 moritzm: installing postgresql-9.6 security updates
  • 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
  • 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
  • 12:03 moritzm: installing lz4 security updates on buster
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
  • 11:53 moritzm: installing curl security updates on stretch
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e840968: jawiki: extended confirmed should be 120 days since first edit, not registration (T284212) (duration: 00m 58s)
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
  • 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:41 godog: test librenms/AM paging
  • 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
  • 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
  • 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary T282761 (duration: 00m 58s)
  • 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - T282373 T282372 T282371
  • 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
  • 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
  • 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
  • 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
  • 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
  • 06:23 ryankemper: T280382 `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:23 ryankemper: T280382 `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - T266486 T268392 T273360
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
  • 04:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:36 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:30 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:29 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
  • 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:07 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:05 ryankemper: T280382 `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:51 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 01:47 ryankemper: T280382 `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
  • 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650), Try II (duration: 00m 57s)
  • 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: user: Accept options-messages for multiselect user options (T58633 T278650) (duration: 00m 57s)
  • 00:35 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:18 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-02

  • 23:57 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:47 ryankemper: T280382 `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:26 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: Allow html form field option 'options-messages' to get parsed (T58633) (duration: 01m 01s)
  • 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgVectorConsolidateUserLinks on the beta cluster (T266536) (duration: 00m 57s)
  • 22:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
  • 22:34 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P{apt*}' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:30 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P{install*}' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:19 Amir1: setting charset of all tables in wikitech to binary (T284108 T269348)
  • 22:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
  • 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
  • 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:59 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
  • 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
  • 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool` (catching up on 17.9h lag)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 21:10 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
  • 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
  • 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9c981d: Revert "enwiktionary: Raise AF emergency disable treshold+count" (T283460) (duration: 00m 58s)
  • 18:11 urbanecm: Deployed security patch for T281972
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4bf76fc: Make DiscussionTools replytool available for everyone on wikitech (T283119) (duration: 00m 58s)
  • 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
  • 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
  • 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
  • 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 12:05 jbond: enable puppet fleet wide. post changing puppetdb to use nginx-light #T164456
  • 11:54 jbond: disable puppet fleet wide. changing puppetdb to use nginx-light #T164456
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: 85feaa1: InfoAction: Cast wgNamespaceProtection to array (T283751) (duration: 01m 00s)
  • 11:08 jbond: update mod_auth_cas T264605
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f12e368: Investigate MediaSearch usability on other wikis (T278984) (duration: 00m 57s)
  • 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #T264605
  • 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #T264605
  • 10:44 topranks: Commit pfw policy 1622570851 to pfw3-codfw and pfw3-eqiad to support new host fran2001 (T282056)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
  • 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280396 (request)' 'OTRS' 'VRT' 'Quiddity (WMF)' # T284118
  • 08:12 moritzm: removed eight inactive addresses from ops@ list
  • 07:44 moritzm: installing squid security updates
  • 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
  • 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
  • 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix pageterms API call for Special:Nearby in Wikidata (T281639) (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
  • 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
  • 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
  • 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
  • 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
  • off: restart tcpircbot-logmsgbot on alert1001 - T284123
  • 04:56 marostegui: Test

2021-06-01

  • 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per T284108
  • 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists (T282303)
  • 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
  • 15:38 legoktm: stopped mailman2 service on lists1001 (T52864)
  • 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 15:16 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 14:59 topranks: Restoring Lumen CCT 442550293 to normal metric / bring back into service (T274234)
  • 13:56 marostegui: Stop mysql on db2079 (codfw master) - T283743
  • 13:53 topranks: Draining Lumen CCT 442550293 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f75774: cawiki: Fix help panel links (T280673) (duration: 00m 58s)
  • 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973 (duration: 02m 58s)
  • 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973
  • 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service (T274234)
  • 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
  • 12:06 moritzm: installing djvulibre security updates
  • 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4989d2: Enable "Diff" RSS feed on meta (T283380) (duration: 00m 58s)
  • 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # T268225
  • 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # T268225
  • 07:26 dcausse: depooling wdsq1005 (lag)
  • 07:14 moritzm: installing nginx security updates
  • 05:56 legoktm: restarting mailman3 on lists1001
  • 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
  • 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
  • 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 07s)
  • 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 00s)

2021-05-31

  • 07:32 legoktm: deleted all outoing list mail that is for a gmail address being unsubscribed T284003
  • 07:30 legoktm: deleted all outoing list mail that is for a yahoo/aol address being unsubscribed T284003
  • 07:23 legoktm: deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" T284003
  • 06:33 legoktm: manually unsubscribed ahalfaker [at] wikimedia.org from scoring-internal list, triggering mailman bounce loop T282348#7124014
  • 06:22 legoktm: sudo systemctl restart mailman3 on lists1001, bounce runner crashed

2021-05-29

  • 14:44 elukey: execute apt-get clean on an-airflow1001 to free space
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet

2021-05-28

2021-05-27

  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:45 thcipriani@deploy1002: Synchronized README: Config: Revert "README: deployment training" (duration: 00m 55s)
  • 23:38 derick@deploy1002: Synchronized README: Config: README: deployment training (duration: 00m 55s)
  • 23:21 egardner@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable MediaSearch Assessment filter (T276257) (duration: 00m 57s)
  • 22:06 urbanecm: Invalidate bot password for `PKM@PKMbot` (T283839)
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda #T279545
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda
  • 19:53 James_F: Manually create missing SecurePoll DB tables on mnwwiktionary, taywiki, and trvwiki for T283844
  • 19:48 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.7
  • 19:15 tgr: US morning deploys done
  • 19:12 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable Add Links for 50% of new users and all old ones (T277356) (duration: 01m 04s)
  • 19:03 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 05s)
  • 18:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:56 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ptwiki: Add 'flow-delete' to 'eliminator' user group (T283266) (duration: 01m 04s)
  • 18:49 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 06s)
  • 18:22 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:09 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Growth's community configuration on the pilot wikis (T283809) (duration: 01m 06s)
  • 17:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 James_F: Running SecurePoll maintenance script cli/updateNotBlockedKey.php for all wikis T277079
  • 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:58 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs1006's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_disk`
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs2004's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_disk`
  • 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 ryankemper: T280382 (fixing couple wrong host names in last log line) `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2001`, `wdqs2002`, and `wdqs2008`, have a 975G `wikidata.jnl`
  • 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2004` and `wdqs2001` have a 975G `wikidata.jnl`. It's not clear why there's such a big divergence
  • 15:41 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`
  • 15:12 XioNoX: test netconf over ssh on cr3-ulsfo
  • 15:03 effie: disable puppet mc2019
  • 14:14 moritzm: bounce keyholder-agent on cumin2001 to drop homer key (now on 2002 only)
  • 12:57 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
  • 12:55 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 12:50 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master T282761 (duration: 01m 04s)
  • 12:47 tgr: EU deploys done
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Always delete from search index in AddLinkSubmissionHandler (T283606) (duration: 01m 06s)
  • 12:40 topranks: cr2-eqord: Gerrit 696383: Removing IPv4 Anycast ranges from bgp_out policy.
  • 12:39 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Add Link: Prevent double-opening of the post-edit dialog (T283120) (duration: 01m 06s)
  • 12:25 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 11:51 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 10:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 09:46 kormat: restarting mariadb on pc1007 to upgrade it
  • 08:35 topranks: removing stale peers (AS8674 / Netnod and AS57695 / Misaka) from cr2-esams
  • 08:30 moritzm: installing libx11 security updates
  • 07:45 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in AMS
  • 07:44 legoktm: adding stephane at kiwix as owner of offline-l per email
  • 07:43 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in eqsin
  • 07:42 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr2-eqord
  • 07:20 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to announce Wikidough Anycast range from cr's in ulsfo
  • 07:14 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr1-eqdfw
  • 07:11 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr2-codfw
  • 06:47 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 06:43 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 13s)
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16227 and previous config saved to /var/cache/conftool/dbconfig/20210527-060953-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P16226 and previous config saved to /var/cache/conftool/dbconfig/20210527-055507-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16225 and previous config saved to /var/cache/conftool/dbconfig/20210527-055450-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16224 and previous config saved to /var/cache/conftool/dbconfig/20210527-053946-root.json
  • 05:29 ryankemper: `ryankemper@cloudelastic1003:~$ sudo run-puppet-agent --force`
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16223 and previous config saved to /var/cache/conftool/dbconfig/20210527-052442-root.json

2021-05-26

  • 23:07 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 23:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 22:17 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 06s)
  • 22:10 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 07s)
  • 21:22 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 19:58 twentyafterfour: finished deploying wmf.7 and error levels appear unchanged. refs T281148
  • 19:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:51 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.7 (duration: 01m 07s)
  • 19:50 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train (duration: 05m 12s)
  • 19:50 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.7
  • 19:45 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train
  • 19:44 twentyafterfour: train is unblocked, proceeding to deploy wmf.7 to group1 wikis refs T281148
  • 19:44 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 19:44 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN
  • 19:43 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3 (duration: 01m 00s)
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3
  • 19:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 9f3410b: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 512d72e: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:25 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: 80abdf9: 92d2952: Enable VisualEditor by default at ptwikinews and plwikinews (T282846, T283033) (duration: 01m 09s)
  • 19:21 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 2
  • 19:17 legoktm: legoktm@deploy1002:~$ sudo -E kubectl delete pod kask-production-6d6869b697-m2qjs -n sessionstore
  • 19:16 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train (duration: 01m 23s)
  • 19:15 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train
  • 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f66b3b: Enable wgCiteResponsiveReferences on svwiki (T281622) (duration: 01m 06s)
  • 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 07b804b: Enable DiscussionTools on wikitech (T283119) (duration: 01m 05s)
  • 17:51 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:39 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:16 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Set $wgShellboxSecretKey - T281423 (duration: 01m 14s)
  • 17:02 moritzm: restarting FPM on mw canaries to pick up libx11 update
  • 16:51 moritzm: installing libx11 security updates
  • 16:38 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr2-codfw - Wikidough Anycast
  • 16:12 marostegui: Reboot db2107 (codfw master) T282072
  • 16:10 marostegui: Reboot db2103 (codfw master) T282072
  • 16:09 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:09 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:01 papaul: powerdown ms-be2038 for BBU replacement
  • 15:41 effie: enable puppet on mc2019
  • 15:31 marostegui: Cold reset db2107 idrac T283727
  • 15:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:23 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:22 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr1-codfw - Wikidough Anycast
  • 15:18 urbanecm: otrs_wikiwiki was moved to vrt-wiki.wikimedia.org (T280400)
  • 15:12 topranks: Merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/694305/ - Add Wikidough Anycast range to network config
  • 15:11 urbanecm@deploy1002: Synchronized wmf-config/: 490435e: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400) (duration: 01m 07s)
  • 15:08 urbanecm@deploy1002: Synchronized multiversion/MWMultiVersion.php: 945ee9c: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400; 1/2) (duration: 01m 06s)
  • 15:02 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 18s)
  • 14:59 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST (duration: 05m 24s)
  • 14:53 otto@deploy1002: Started deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST
  • 14:50 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 14:49 otto@deploy1002: Started deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN
  • 14:49 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 30m 22s)
  • 14:47 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:31 moritzm: updated bullseye d-i image to 2021-05-26 daily image T275873
  • 14:19 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:18 otto@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 00m 06s)
  • 14:18 otto@deploy1002: Started deploy [analytics/refinery@e536abd]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 14s)
  • 14:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
  • 14:03 hashar@deploy1002: Finished deploy [integration/docroot@ebee5d3]: composer/npm updates (duration: 00m 09s)
  • 14:03 hashar@deploy1002: Started deploy [integration/docroot@ebee5d3]: composer/npm updates
  • 11:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)
  • 11:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)
  • 11:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 06s)
  • 11:26 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 19s)
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Test Wikidata: Enable empty list to object serialization (T241422) (duration: 01m 19s)
  • 10:26 moritzm: installing lz4 security updates on buster
  • 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 01m 18s)
  • 09:45 godog: rm /root/prometheus from prometheus5001 - old transition files
  • 09:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 02m 12s)
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16222 and previous config saved to /var/cache/conftool/dbconfig/20210526-093647-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16221 and previous config saved to /var/cache/conftool/dbconfig/20210526-092144-root.json
  • 09:13 elukey: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/695192 on {cr1|cr2}-eqiad - T225005
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16220 and previous config saved to /var/cache/conftool/dbconfig/20210526-090640-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16219 and previous config saved to /var/cache/conftool/dbconfig/20210526-085137-root.json
  • 08:12 _joe_: purging images on deneb
  • 08:11 kormat: running 'optimize table' over parsercache db on pc1007 with replication enabled T282761
  • 07:14 ryankemper: Pooled `wdqs1013` (caught up on lag), de-pooled `wdqs2003` (should not have been pooled due to reimage failure)
  • 07:13 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs2003.codfw.wmnet
  • 05:46 marostegui: Stop MySQL on clouddb1021 to upgrade mysql
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16215 and previous config saved to /var/cache/conftool/dbconfig/20210526-051935-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P16214 and previous config saved to /var/cache/conftool/dbconfig/20210526-050919-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16213 and previous config saved to /var/cache/conftool/dbconfig/20210526-050431-root.json
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16212 and previous config saved to /var/cache/conftool/dbconfig/20210526-044928-root.json
  • 04:35 marostegui: Deploy schema change on db1106, this will generate lag on s1 (enwiki) on wiki replicas T266486 T268392 T273360
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16211 and previous config saved to /var/cache/conftool/dbconfig/20210526-043439-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16210 and previous config saved to /var/cache/conftool/dbconfig/20210526-043424-root.json
  • 03:29 eileen: process-control config revision is 7b646533da
  • 00:47 eileen: civicrm revision changed from 584b96452a to eac772e9c9, config revision is 2ca92c3c3c
  • 00:27 mutante: phab2001 - restarted apache2

2021-05-25

  • 23:09 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 22:39 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:04 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:04 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:58 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:58 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:40 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:28 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.7
  • 19:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.7
  • 18:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I2ebe96 (duration: 00m 56s)
  • 17:34 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761.
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json
  • 17:22 effie: disable puppet on mc2019 (for tests)
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json
  • 12:55 urbanecm@deploy1002: Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:52 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json
  • 11:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:21 Lucas_WMDE: EU backport&config window done
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change HTTP to HTTPS for concept URIs on Commons (T258590) (duration: 00m 56s)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16196 and previous config saved to /var/cache/conftool/dbconfig/20210525-111719-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16195 and previous config saved to /var/cache/conftool/dbconfig/20210525-110215-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16194 and previous config saved to /var/cache/conftool/dbconfig/20210525-104711-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16193 and previous config saved to /var/cache/conftool/dbconfig/20210525-103208-root.json
  • 09:58 ema: cp3054: upgrade varnish to latest LTS (6.0.7-1wm1) T264398
  • 09:28 jynus: updating puppet facts on cloud from puppetmaster1001
  • 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:01 kormat: stopping replication on pc1010 T282761
  • 09:00 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc1 primary T282761 (duration: 00m 58s)
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:18 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:17 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:14 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16192 and previous config saved to /var/cache/conftool/dbconfig/20210525-080234-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P16191 and previous config saved to /var/cache/conftool/dbconfig/20210525-074950-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16190 and previous config saved to /var/cache/conftool/dbconfig/20210525-074730-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16189 and previous config saved to /var/cache/conftool/dbconfig/20210525-073227-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16188 and previous config saved to /var/cache/conftool/dbconfig/20210525-071723-root.json
  • 06:16 kart_: Updated cxserver to 2021-05-15-034540-production (T276214)
  • 06:05 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:58 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:53 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:14 marostegui: Reload daily_account_consistency_check.service on mwmaint1002
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16187 and previous config saved to /var/cache/conftool/dbconfig/20210525-050921-root.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16186 and previous config saved to /var/cache/conftool/dbconfig/20210525-045417-root.json
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16185 and previous config saved to /var/cache/conftool/dbconfig/20210525-043914-root.json
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P16184 and previous config saved to /var/cache/conftool/dbconfig/20210525-043234-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P16183 and previous config saved to /var/cache/conftool/dbconfig/20210525-043129-marostegui.json
  • 04:25 marostegui: Stop MySQL on dbstore1004 to clone dbstore1006 T283125
  • 04:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16181 and previous config saved to /var/cache/conftool/dbconfig/20210525-042410-root.json
  • 02:06 James_F: 1.37.0-wmf.7 was branched at 7ee6a2e for T281148 by the TrainBranchBot
  • 00:48 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:44 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 00:37 bstorm: labstore1007 downtimed for maintenance T281045

2021-05-24

  • 21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 19:32 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:23 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:20 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:15 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:33 urbanecm: Morning B&C deployment done
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9cd344: Disable Education Program namespaces in hewiki (T217137) (duration: 00m 56s)
  • 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/skins/Vector/: 1742532687b: Introduce the vector-body class (T283206) (duration: 00m 57s)
  • 17:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:35 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:17 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:14 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1022.eqiad.wmnet
  • 15:55 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1022.eqiad.wmnet
  • 15:52 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:47 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:45 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:41 twentyafterfour: deploying phabricator hotfix (and restarting php7.3-fpm on phab1001)
  • 15:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1021.eqiad.wmnet
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16176 and previous config saved to /var/cache/conftool/dbconfig/20210524-150926-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16175 and previous config saved to /var/cache/conftool/dbconfig/20210524-145422-root.json
  • 14:50 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1021.eqiad.wmnet
  • 14:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1020.eqiad.wmnet
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16174 and previous config saved to /var/cache/conftool/dbconfig/20210524-143919-root.json
  • 14:36 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1020.eqiad.wmnet
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16173 and previous config saved to /var/cache/conftool/dbconfig/20210524-142415-root.json
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:40 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:37 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:33 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 12:18 urbanecm: Uninstalling Flow from ruwiki: Delete all pages in NS2600 (Flow's Topic) in ruwiki via deleteBatch.php (T282132; P16170)
  • 12:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 47e040b: ruwiki: Uninstall Flow (T282132) (duration: 00m 56s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16169 and previous config saved to /var/cache/conftool/dbconfig/20210524-113711-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16168 and previous config saved to /var/cache/conftool/dbconfig/20210524-112011-root.json
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1129e01: Remove wgGEMentorshipMigrationStage (T279853) (duration: 00m 57s)
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16167 and previous config saved to /var/cache/conftool/dbconfig/20210524-110508-root.json
  • 11:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 829c61d: Deploy Growth features to newcomers on bgwiki, urwiki (T280824, T280067) (duration: 00m 56s)
  • 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16166 and previous config saved to /var/cache/conftool/dbconfig/20210524-105004-root.json
  • 10:35 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 16s)
  • 10:35 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16165 and previous config saved to /var/cache/conftool/dbconfig/20210524-103501-root.json
  • 10:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@a9a577a]: (no justification provided) (duration: 00m 15s)
  • 10:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@a9a577a]: (no justification provided)
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16164 and previous config saved to /var/cache/conftool/dbconfig/20210524-075958-root.json
  • 07:49 XioNoX: bump Equinix Chicago RS max prefix
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16163 and previous config saved to /var/cache/conftool/dbconfig/20210524-074659-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16162 and previous config saved to /var/cache/conftool/dbconfig/20210524-074454-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16161 and previous config saved to /var/cache/conftool/dbconfig/20210524-072950-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16160 and previous config saved to /var/cache/conftool/dbconfig/20210524-071447-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 - schema change', diff saved to https://phabricator.wikimedia.org/P16159 and previous config saved to /var/cache/conftool/dbconfig/20210524-052747-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16158 and previous config saved to /var/cache/conftool/dbconfig/20210524-051345-root.json
  • 05:09 legoktm: restarting mailman3 on lists1001, bounce runner crashed
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16157 and previous config saved to /var/cache/conftool/dbconfig/20210524-045841-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16156 and previous config saved to /var/cache/conftool/dbconfig/20210524-044337-root.json
  • 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P16155 and previous config saved to /var/cache/conftool/dbconfig/20210524-043654-marostegui.json
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16154 and previous config saved to /var/cache/conftool/dbconfig/20210524-042834-root.json

2021-05-23

  • 14:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: EMERGENCY: f752f8b: enwiktionary: Raise AF emergency disable treshold+count (T283460) (duration: 00m 57s)

2021-05-22

  • 22:13 legoktm: reset 2FA for User:Yuvipanda on wikitech
  • 21:07 ryankemper: [WDQS] Pooled `wdqs1006` (caught up on lag), de-pooled `wdqs1013` (8 hours)
  • 16:35 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php cswiki --delete

2021-05-21

  • 22:32 bstorm: upload nfsd-ldap: 1.2+deb10u1 to buster-wikimedia T283385
  • 18:24 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:22 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:14 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:39 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:36 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:28 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 19s)
  • 17:21 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:17 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:01 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:19 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:14 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:07 clarakosi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:20 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:59 reedy@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 12:56 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:34 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetdb-api
  • 12:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:24 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=docker-registry
  • 12:23 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=docker-registry
  • 12:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16150 and previous config saved to /var/cache/conftool/dbconfig/20210521-122253-root.json
  • 12:15 topranks: "Removing BGP peering sessions to LinkedIn AS14413 at AMS-IX / cr2-esams as they are no longer on the exchange."
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16149 and previous config saved to /var/cache/conftool/dbconfig/20210521-120749-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16148 and previous config saved to /var/cache/conftool/dbconfig/20210521-115246-root.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16147 and previous config saved to /var/cache/conftool/dbconfig/20210521-113742-root.json
  • 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2008.codfw.wmnet
  • 09:51 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2007.codfw.wmnet
  • 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2006.codfw.wmnet
  • 09:32 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2005.codfw.wmnet
  • 09:32 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:28 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:26 gehel: depooling wdqs1006 to catch up on lag
  • 09:24 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2008.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2007.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2006.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:14 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2005.codfw.wmnet
  • 08:56 kormat: deploying cumin2002 grants to production T276589
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1001.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2004.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2003.wikimedia.org
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16146 and previous config saved to /var/cache/conftool/dbconfig/20210521-082009-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P16145 and previous config saved to /var/cache/conftool/dbconfig/20210521-080540-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16144 and previous config saved to /var/cache/conftool/dbconfig/20210521-080506-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16143 and previous config saved to /var/cache/conftool/dbconfig/20210521-075002-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16142 and previous config saved to /var/cache/conftool/dbconfig/20210521-073459-root.json
  • 06:32 moritzm: installing libspring-java security updates on stretch
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16141 and previous config saved to /var/cache/conftool/dbconfig/20210521-053027-root.json
  • 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16140 and previous config saved to /var/cache/conftool/dbconfig/20210521-051523-root.json
  • 05:14 moritzm: installing graphviz security updates on stretch
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16139 and previous config saved to /var/cache/conftool/dbconfig/20210521-050020-root.json
  • 04:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1087.eqiad.wmnet
  • 04:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1087.eqiad.wmnet
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P16138 and previous config saved to /var/cache/conftool/dbconfig/20210521-044717-marostegui.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16137 and previous config saved to /var/cache/conftool/dbconfig/20210521-044516-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P16136 and previous config saved to /var/cache/conftool/dbconfig/20210521-044339-marostegui.json
  • 01:27 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa
  • 01:18 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa

2021-05-20

  • 21:45 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:54 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mwlog1001.eqiad.wmnet
  • 19:43 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16134 and previous config saved to /var/cache/conftool/dbconfig/20210520-193039-root.json
  • 19:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16133 and previous config saved to /var/cache/conftool/dbconfig/20210520-191536-root.json
  • 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.6
  • 19:01 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16132 and previous config saved to /var/cache/conftool/dbconfig/20210520-190031-root.json
  • 18:56 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16131 and previous config saved to /var/cache/conftool/dbconfig/20210520-184527-root.json
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1002: sync-file aborted: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 00m 00s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 7fb129f: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 05s)
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
  • 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:33 moritzm: installing graphviz security updates on buster
  • 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
  • 15:30 _joe_: test
  • 15:23 moritzm: installing graphviz security updates on buster
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
  • 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - T283196 (duration: 01m 05s)
  • 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames T283167 (duration: 01m 05s)
  • 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
  • 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
  • 12:30 kormat: Deploying wmfmariadbpy 0.7 T283228
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16125 and previous config saved to /var/cache/conftool/dbconfig/20210520-112026-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16124 and previous config saved to /var/cache/conftool/dbconfig/20210520-110522-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16123 and previous config saved to /var/cache/conftool/dbconfig/20210520-105018-root.json
  • 10:15 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw T266486 T268392 T273360
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16122 and previous config saved to /var/cache/conftool/dbconfig/20210520-093510-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16121 and previous config saved to /var/cache/conftool/dbconfig/20210520-093257-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16120 and previous config saved to /var/cache/conftool/dbconfig/20210520-091754-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16119 and previous config saved to /var/cache/conftool/dbconfig/20210520-090250-root.json
  • 08:56 godog: move icinga-wm to libera.chat
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16118 and previous config saved to /var/cache/conftool/dbconfig/20210520-084746-root.json
  • 07:44 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:41 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16117 and previous config saved to /var/cache/conftool/dbconfig/20210520-071723-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16116 and previous config saved to /var/cache/conftool/dbconfig/20210520-071432-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16115 and previous config saved to /var/cache/conftool/dbconfig/20210520-065928-root.json
  • 06:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 06:50 ryankemper: T283223 Write queue not draining fast enough for the next node to reboot, will finish reboot tomorrow
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16114 and previous config saved to /var/cache/conftool/dbconfig/20210520-064425-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16113 and previous config saved to /var/cache/conftool/dbconfig/20210520-062921-root.json
  • 06:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/PageProps.php: Backport: PageProps: be prepared that PageIdentity is not proper title (T283170) (duration: 01m 06s)
  • 06:08 elukey: powercycle ms-be2035 - no ssh available, no metrics since hours ago, I/O errors registered in the main tty on serial console
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16112 and previous config saved to /var/cache/conftool/dbconfig/20210520-054402-root.json
  • 05:33 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 05:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16111 and previous config saved to /var/cache/conftool/dbconfig/20210520-052859-root.json
  • 05:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1011.eqiad.wmnet
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16110 and previous config saved to /var/cache/conftool/dbconfig/20210520-051355-root.json
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1011.eqiad.wmnet
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16109 and previous config saved to /var/cache/conftool/dbconfig/20210520-050025-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16108 and previous config saved to /var/cache/conftool/dbconfig/20210520-045919-marostegui.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16107 and previous config saved to /var/cache/conftool/dbconfig/20210520-045852-root.json
  • 01:01 mutante: signing puppet certs for doh2001 and doh2002.wikimedia.org (T283192)
  • 00:14 ejegg: updated fundraising CiviCRM from b3fb3c9cb0 to 35f5afb1b4
  • 00:13 ejegg: updated payments-wiki from 9f51ace546 to 6fac77f60e

2021-05-19

  • 22:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ sleep 3600 && mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=7200 --user=Lusccasdeutsch . # T278856 # 3 video files
  • 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2002.wikimedia.org
  • 22:27 Urbanecm: Start server-side upload for 1 video file (T283186)
  • 22:25 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:22 Urbanecm: Start server-side upload for 3 video file (T283102, T283054)
  • 22:22 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:12 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 22:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2001.wikimedia.org
  • 22:09 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 11s)
  • 22:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 22:00 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:51 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2001.wikimedia.org
  • 21:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 20:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
  • 19:40 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
  • 18:30 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 18:23 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.6 T281147
  • 18:17 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 16:13 volans: uploaded debmonitor-client_0.3.0 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16103 and previous config saved to /var/cache/conftool/dbconfig/20210519-154808-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16102 and previous config saved to /var/cache/conftool/dbconfig/20210519-153304-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16101 and previous config saved to /var/cache/conftool/dbconfig/20210519-151800-root.json
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16100 and previous config saved to /var/cache/conftool/dbconfig/20210519-150257-root.json
  • 13:33 kormat: uploaded wmfmariadb 0.7 packages to apt
  • 13:29 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16099 and previous config saved to /var/cache/conftool/dbconfig/20210519-131920-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16098 and previous config saved to /var/cache/conftool/dbconfig/20210519-131012-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16097 and previous config saved to /var/cache/conftool/dbconfig/20210519-125508-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16096 and previous config saved to /var/cache/conftool/dbconfig/20210519-124004-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16095 and previous config saved to /var/cache/conftool/dbconfig/20210519-122501-root.json
  • 11:45 matthiasmullie: "EU backports done"
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16093 and previous config saved to /var/cache/conftool/dbconfig/20210519-114203-marostegui.json
  • 11:41 mlitn@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules: Backport: Add a link: Set contentedtiable=false on mobile (T281771) (duration: 01m 06s)
  • 11:14 mlitn@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 2 (duration: 01m 04s)
  • 11:13 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 1 (duration: 01m 34s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16091 and previous config saved to /var/cache/conftool/dbconfig/20210519-092630-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16090 and previous config saved to /var/cache/conftool/dbconfig/20210519-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16089 and previous config saved to /var/cache/conftool/dbconfig/20210519-085622-root.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16088 and previous config saved to /var/cache/conftool/dbconfig/20210519-084119-root.json
  • 08:28 marostegui: Stop MySQL on db1175 to upgrade kernel and mysql
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16087 and previous config saved to /var/cache/conftool/dbconfig/20210519-082713-marostegui.json
  • 08:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump (duration: 02m 24s)
  • 08:10 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump
  • 07:48 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes (duration: 02m 23s)
  • 07:45 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16086 and previous config saved to /var/cache/conftool/dbconfig/20210519-074530-root.json
  • 07:42 XioNoX: roll SNMP: filter out default logical interfaces (.0) to all network devices - T283060
  • 07:38 godog: add 100G to prometheus/ops eqiad
  • 07:31 marostegui: Deploy schema change on s3 codfw, lag will appear in codfw T266486 T268392 T273360
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16085 and previous config saved to /var/cache/conftool/dbconfig/20210519-073027-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16084 and previous config saved to /var/cache/conftool/dbconfig/20210519-071523-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16083 and previous config saved to /var/cache/conftool/dbconfig/20210519-070019-root.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1010.eqiad.wmnet
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 T280492', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1010.eqiad.wmnet
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P16081 and previous config saved to /var/cache/conftool/dbconfig/20210519-063345-marostegui.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16080 and previous config saved to /var/cache/conftool/dbconfig/20210519-062824-root.json
  • 06:18 Amir1: upgrading daily-article-l to mailman3 (T282271 T280322)
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16079 and previous config saved to /var/cache/conftool/dbconfig/20210519-061321-root.json
  • 06:04 legoktm: restarted mailman3 on lists1001
  • 06:01 legoktm: stopped mailman3 service on lists1001 for schema change
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16078 and previous config saved to /var/cache/conftool/dbconfig/20210519-055817-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16077 and previous config saved to /var/cache/conftool/dbconfig/20210519-055134-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16076 and previous config saved to /var/cache/conftool/dbconfig/20210519-054313-root.json
  • 05:17 marostegui: Compress a few tables on s3 T283125
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109', diff saved to https://phabricator.wikimedia.org/P16075 and previous config saved to /var/cache/conftool/dbconfig/20210519-045857-marostegui.json
  • 03:03 reedy@deploy1002: Synchronized php-1.37.0-wmf.5/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 01m 05s)
  • 03:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.6/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 02m 35s)

2021-05-18

  • 18:40 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 05m 16s)
  • 18:35 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:35 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 00m 07s)
  • 18:34 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:33 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 15m 39s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3da5a8b: Update IP addresses for Wiki Education Dashboard exemptions (T283096) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1002: Synchronized w/robots.php: 8224e53: robots.php: avoid using ContentHandler::getContentText() (T268041) (duration: 01m 04s)
  • 18:17 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'db1085 being decommissioned T282096', diff saved to https://phabricator.wikimedia.org/P16073 and previous config saved to /var/cache/conftool/dbconfig/20210518-160053-kormat.json
  • 15:30 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 15:23 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 07s)
  • 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1085.eqiad.wmnet
  • 14:38 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on all wikis - T238138 (duration: 01m 06s)
  • 14:32 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.6
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1085.eqiad.wmnet
  • 14:21 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.6 (duration: 79m 07s)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16067 and previous config saved to /var/cache/conftool/dbconfig/20210518-142042-root.json
  • 14:17 moritzm: installing remaining postgresql-11 updates (client tools and libs, servers already done)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16066 and previous config saved to /var/cache/conftool/dbconfig/20210518-140538-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16065 and previous config saved to /var/cache/conftool/dbconfig/20210518-135034-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16064 and previous config saved to /var/cache/conftool/dbconfig/20210518-133531-root.json
  • 13:02 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.6
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P16063 and previous config saved to /var/cache/conftool/dbconfig/20210518-125945-marostegui.json
  • 12:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16062 and previous config saved to /var/cache/conftool/dbconfig/20210518-124247-root.json
  • 12:40 Krinkle: krinkle@mw1002 purge-parsercache-now.php on pc1010 (spare, depooled), ref P16060, T280605, T282761
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16061 and previous config saved to /var/cache/conftool/dbconfig/20210518-122744-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16059 and previous config saved to /var/cache/conftool/dbconfig/20210518-121240-root.json
  • 12:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.4 (duration: 01m 28s)
  • 12:07 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.3 (duration: 01m 50s)
  • 12:04 hashar@deploy1002: clean aborted: Pruned MediaWiki: 1.37.0-wmf.1 (duration: 01m 16s)
  • 12:04 hashar: scap clean 1.37.0-wmf.1 1.37.0-wmf.3 and 1.37.0-wmf.4 # T281147
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16058 and previous config saved to /var/cache/conftool/dbconfig/20210518-115736-root.json
  • 11:41 moritzm: upgrading idp2001 to Java 11.0.11
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P16057 and previous config saved to /var/cache/conftool/dbconfig/20210518-112942-marostegui.json
  • 10:53 moritzm: upgrade idp-test to OpenJDK 11.0.11 T281345
  • 10:27 moritzm: installing OpenJDK updates on Hadoop/Druid/AQS/kafka-Jumbo
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16056 and previous config saved to /var/cache/conftool/dbconfig/20210518-102607-root.json
  • 10:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16055 and previous config saved to /var/cache/conftool/dbconfig/20210518-101104-root.json
  • 10:03 kormat: stopping mariadb on db1085 T282096
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16054 and previous config saved to /var/cache/conftool/dbconfig/20210518-095600-root.json
  • 09:47 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16053 and previous config saved to /var/cache/conftool/dbconfig/20210518-094732-kormat.json
  • 09:44 XioNoX: 👍
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16052 and previous config saved to /var/cache/conftool/dbconfig/20210518-094056-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1087 from dbctl T282093', diff saved to https://phabricator.wikimedia.org/P16051 and previous config saved to /var/cache/conftool/dbconfig/20210518-093552-marostegui.json
  • 09:32 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16050 and previous config saved to /var/cache/conftool/dbconfig/20210518-093228-kormat.json
  • 09:30 topranks: add peering sessions to AS8708 RCS & RDS on cr2-esams
  • 09:27 XioNoX: push test SNMP filter config on asw-a-codfw - T283060
  • 09:17 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16049 and previous config saved to /var/cache/conftool/dbconfig/20210518-091725-kormat.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P16048 and previous config saved to /var/cache/conftool/dbconfig/20210518-091717-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16047 and previous config saved to /var/cache/conftool/dbconfig/20210518-091702-root.json
  • 09:04 kormat@cumin1001: dbctl commit (dc=all): 'Set db1131 to weight 400 in s6/eqiad T280751', diff saved to https://phabricator.wikimedia.org/P16046 and previous config saved to /var/cache/conftool/dbconfig/20210518-090449-kormat.json
  • 09:02 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16045 and previous config saved to /var/cache/conftool/dbconfig/20210518-090215-kormat.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16044 and previous config saved to /var/cache/conftool/dbconfig/20210518-090159-root.json
  • 09:01 kormat@cumin1001: dbctl commit (dc=all): 'Remove s6 eqiad primary from 'api' group T280751', diff saved to https://phabricator.wikimedia.org/P16043 and previous config saved to /var/cache/conftool/dbconfig/20210518-090156-kormat.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16042 and previous config saved to /var/cache/conftool/dbconfig/20210518-084643-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16041 and previous config saved to /var/cache/conftool/dbconfig/20210518-083139-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P16040 and previous config saved to /var/cache/conftool/dbconfig/20210518-075532-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16039 and previous config saved to /var/cache/conftool/dbconfig/20210518-075458-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16038 and previous config saved to /var/cache/conftool/dbconfig/20210518-073955-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16037 and previous config saved to /var/cache/conftool/dbconfig/20210518-072451-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16036 and previous config saved to /var/cache/conftool/dbconfig/20210518-070947-root.json
  • 07:06 marostegui: Deploy schema change on s4 codfw, lag will appear in codfw T266486 T268392 T273360
  • 06:54 XioNoX: Homerify cloudsw ospf
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P16035 and previous config saved to /var/cache/conftool/dbconfig/20210518-064426-marostegui.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1083.eqiad.wmnet
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16034 and previous config saved to /var/cache/conftool/dbconfig/20210518-064033-root.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1083.eqiad.wmnet
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1083 from dbctl T281445', diff saved to https://phabricator.wikimedia.org/P16033 and previous config saved to /var/cache/conftool/dbconfig/20210518-062947-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16032 and previous config saved to /var/cache/conftool/dbconfig/20210518-062529-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16031 and previous config saved to /var/cache/conftool/dbconfig/20210518-061026-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16030 and previous config saved to /var/cache/conftool/dbconfig/20210518-055522-root.json
  • 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1009.eqiad.wmnet
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1009.eqiad.wmnet
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P16029 and previous config saved to /var/cache/conftool/dbconfig/20210518-052324-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16028 and previous config saved to /var/cache/conftool/dbconfig/20210518-050949-marostegui.json
  • 05:06 marostegui: Restart db1115 mysql
  • 00:56 eileen: civicrm revision changed from 38ac15233f to b3fb3c9cb0, config revision is 1f8d0a6bfa

2021-05-17

  • 23:33 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 01s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 55s)
  • 21:46 sbassett: Deployed security patch (and ran scap sync-l10n) for T260865
  • 19:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize WikidataCompletionSearchClicks Event Platform migration - T282140 (duration: 00m 58s)
  • 19:13 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to Event Platform on group 0 and group 1 - T238138 (duration: 00m 59s)
  • 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/skins/Vector/includes/FeatureManagement/Requirements/LanguageInHeaderTreatmentRequirement.php: e180b99: Allow `languageinheader` query param to fully control treatment of languages (T282543) (duration: 00m 58s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: c30f92b5: Remove expired throttle rule (duration: 00m 59s)
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16022 and previous config saved to /var/cache/conftool/dbconfig/20210517-165322-root.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16021 and previous config saved to /var/cache/conftool/dbconfig/20210517-163819-root.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16020 and previous config saved to /var/cache/conftool/dbconfig/20210517-162315-root.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16019 and previous config saved to /var/cache/conftool/dbconfig/20210517-160811-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16018 and previous config saved to /var/cache/conftool/dbconfig/20210517-153311-root.json
  • 15:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.5
  • 15:26 elukey@deploy1002: Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 (duration: 19m 48s)
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16017 and previous config saved to /var/cache/conftool/dbconfig/20210517-151807-root.json
  • 15:06 elukey@deploy1002: Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16016 and previous config saved to /var/cache/conftool/dbconfig/20210517-150303-root.json
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16015 and previous config saved to /var/cache/conftool/dbconfig/20210517-144800-root.json
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16014 and previous config saved to /var/cache/conftool/dbconfig/20210517-141737-marostegui.json
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16013 and previous config saved to /var/cache/conftool/dbconfig/20210517-141627-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16012 and previous config saved to /var/cache/conftool/dbconfig/20210517-140438-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16011 and previous config saved to /var/cache/conftool/dbconfig/20210517-140435-root.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16010 and previous config saved to /var/cache/conftool/dbconfig/20210517-140123-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16009 and previous config saved to /var/cache/conftool/dbconfig/20210517-134934-root.json
  • 13:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16008 and previous config saved to /var/cache/conftool/dbconfig/20210517-134931-root.json
  • 13:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16007 and previous config saved to /var/cache/conftool/dbconfig/20210517-134619-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16006 and previous config saved to /var/cache/conftool/dbconfig/20210517-133431-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16005 and previous config saved to /var/cache/conftool/dbconfig/20210517-133427-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16004 and previous config saved to /var/cache/conftool/dbconfig/20210517-133116-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16003 and previous config saved to /var/cache/conftool/dbconfig/20210517-131927-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16002 and previous config saved to /var/cache/conftool/dbconfig/20210517-131924-root.json
  • 13:10 marostegui: Upgrade kernel and mysql (10.4.19) on db1144:3314, db1144:3315
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314, db1144:3315 for kernel and mysql upgrade', diff saved to https://phabricator.wikimedia.org/P16001 and previous config saved to /var/cache/conftool/dbconfig/20210517-130935-marostegui.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16000 and previous config saved to /var/cache/conftool/dbconfig/20210517-125742-marostegui.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15999 and previous config saved to /var/cache/conftool/dbconfig/20210517-123548-root.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15998 and previous config saved to /var/cache/conftool/dbconfig/20210517-122045-root.json
  • 12:08 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:07 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15997 and previous config saved to /var/cache/conftool/dbconfig/20210517-120541-root.json
  • 12:04 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:55 marostegui: Deploy schema change on s8 codfw, lag will appear in codfw T266486 T268392 T273360
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15996 and previous config saved to /var/cache/conftool/dbconfig/20210517-115037-root.json
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswikibooks --fix
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswiki --fix
  • 11:49 Urbanecm: 11:49:22 Synchronized wmf-config/InitialiseSettings.php: a73fe2d: Make the Malaysian talk namespaces names consistent (duration: 01m 08s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1e06f83: Enable SandboxLink at azwiki (T282954) (duration: 01m 08s)
  • 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 32e4343: urwiki: Grant `editprotected` to eliminators (T281274) (duration: 01m 08s)
  • 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 36d29a6: Enable NewUserMessage on ptwikinews (T282845) (duration: 01m 09s)
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P15995 and previous config saved to /var/cache/conftool/dbconfig/20210517-111343-marostegui.json
  • 11:07 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T282886)
  • 11:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:06 urbanecm@deploy1002: Synchronized static/images/project-logos/: b1da7aa: Update bnwiki project logo (T282886) (duration: 01m 42s)
  • 11:03 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15994 and previous config saved to /var/cache/conftool/dbconfig/20210517-103823-root.json
  • 10:37 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 07s)
  • 10:36 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:30 moritzm: installing postgresql-11 security updates
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15993 and previous config saved to /var/cache/conftool/dbconfig/20210517-102319-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15992 and previous config saved to /var/cache/conftool/dbconfig/20210517-100815-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15991 and previous config saved to /var/cache/conftool/dbconfig/20210517-095312-root.json
  • 09:43 hashar: Restarted CI Jenkins to update the instant-messaging and ircbot plugins # T271122
  • 09:33 moritzm: installing libimage-exiftool-perl security updates
  • 09:29 topranks: push CR691140 to eqiad and codfw core routers - T282809
  • 09:18 hashar: Restarting CI Jenkins to upgrade the Gearman plugin # T281737
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P15990 and previous config saved to /var/cache/conftool/dbconfig/20210517-091636-marostegui.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15989 and previous config saved to /var/cache/conftool/dbconfig/20210517-091604-root.json
  • 09:06 ema: cp_eqsin: run confd-reload-vcl manually to fix /var/run/reload-vcl-state T282880
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15988 and previous config saved to /var/cache/conftool/dbconfig/20210517-090101-root.json
  • 08:52 vgutierrez: pool cp5016
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15987 and previous config saved to /var/cache/conftool/dbconfig/20210517-084557-root.json
  • 08:45 vgutierrez: depool cp5016
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15986 and previous config saved to /var/cache/conftool/dbconfig/20210517-083053-root.json
  • 08:28 Urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from global_group_restrictions where ggr_group="Indic_Bots"; # T282968
  • 08:26 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 93e61f7: Use svwiki 20th anniversary logos (T282389) (duration: 01m 08s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 0f356a3: Add svwiki 20th anniversary logos (T282389) (duration: 01m 12s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15985 and previous config saved to /var/cache/conftool/dbconfig/20210517-061232-marostegui.json
  • 06:01 kormat: restarting mariadb on db1131 to pick up report_host T266483
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15984 and previous config saved to /var/cache/conftool/dbconfig/20210517-055556-root.json
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1079.eqiad.wmnet
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15983 and previous config saved to /var/cache/conftool/dbconfig/20210517-054053-root.json
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1079.eqiad.wmnet
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15982 and previous config saved to /var/cache/conftool/dbconfig/20210517-052549-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1079 from dbctl T282079', diff saved to https://phabricator.wikimedia.org/P15981 and previous config saved to /var/cache/conftool/dbconfig/20210517-051728-marostegui.json
  • 05:13 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1131 until it's reimaged to buster T282124', diff saved to https://phabricator.wikimedia.org/P15980 and previous config saved to /var/cache/conftool/dbconfig/20210517-051312-kormat.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15979 and previous config saved to /var/cache/conftool/dbconfig/20210517-051045-root.json
  • 05:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 master and set section read-write T282124', diff saved to https://phabricator.wikimedia.org/P15978 and previous config saved to /var/cache/conftool/dbconfig/20210517-050740-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T282124', diff saved to https://phabricator.wikimedia.org/P15977 and previous config saved to /var/cache/conftool/dbconfig/20210517-050526-kormat.json
  • 05:05 kormat: Starting s6 eqiad failover from db1131 to db1173 - T282124
  • 04:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:46 kormat@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T282124', diff saved to https://phabricator.wikimedia.org/P15976 and previous config saved to /var/cache/conftool/dbconfig/20210517-044657-kormat.json
  • 04:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124', diff saved to https://phabricator.wikimedia.org/P15974 and previous config saved to /var/cache/conftool/dbconfig/20210517-043148-marostegui.json
  • 02:10 legoktm: uninstalled python3-dbg on lists1001
  • 01:31 legoktm: restarted mailman3-web
  • 00:13 legoktm: installing python3-dbg on lists1001

2021-05-16

  • 22:45 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=tawiki wikilove # T280326
  • 20:46 legoktm: restarted mailman3-web
  • 19:38 legoktm: restarted mailman3-web
  • 17:29 Amir1: restart mailman3-web
  • 02:39 legoktm: restarting mailman3-web on lists1001 again
  • 00:53 legoktm: restarted mailman3-web on lists1001, uwsgi looked like it got stuck, consuming all CPU/memory

2021-05-15

  • 12:33 Amir1: set fr_quality to 0 for all revisions on several wikis (T279761)
  • 06:54 Amir1: migrating most of last mailing lists of T280322

2021-05-14

  • 20:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet
  • 20:32 mutante: people1002 - decom'ing - please use people1003 and see list mail
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1002.eqiad.wmnet
  • 18:58 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:58 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:39 cdanis: ✔️ cdanis@install1003.wikimedia.org ~ 🕝☕ sudo systemctl restart squid.service
  • 18:14 mutante: people1003/people2002: awk -F: '$6 ~ "^\/home" {print $1,$6}' /etc/passwd | while read line ; do user=${line% *}; dir=${line#* }; sudo mkdir -p ${dir}/public_html; sudo chown $user ${dir}/public_html; done (courtesy of Jbond)
  • 17:49 bblack: install1003 - restored normal resolv.conf + re-enabled+ran puppet
  • 17:41 bblack: install1003 - restart squid
  • 17:35 bblack: install1003 - puppet disabled and /etc/resolv.conf manually patched over to deal with a current issue
  • 17:25 cdanis: rolled back cr1-eqiad/cr2-eqiad interface disables T282881
  • 17:10 cdanis: cdanis@re0.cr1-eqiad# set interfaces gr-3/3/0.1 disable # T282881
  • 17:03 cdanis: cdanis@re0.cr2-eqiad# set interfaces gr-4/3/0.2 disable # T282881
  • 15:22 cdanis@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:22 cdanis@cumin2002: START - Cookbook sre.network.cf
  • 15:05 Urbanecm: Start server-side upload for 1 video file (T282874)
  • 14:09 andrew@deploy1002: Finished deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard (duration: 04m 15s)
  • 14:04 andrew@deploy1002: Started deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard
  • 12:54 bblack: re-running puppet agent on cp5*
  • 12:19 jbond42: run puppet on CP servers
  • 04:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisionlist/RevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisiondelete/RevDelRevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:18 ariel@deploy1002: Finished deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path (duration: 00m 03s)
  • 04:18 ariel@deploy1002: Started deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path
  • 03:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/MapSources/includes/specials/MapSourcesPage.php: fix PHP notice T282833 (duration: 01m 07s)
  • 03:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/WikiPage.php: T282844 (duration: 01m 06s)
  • 03:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/PageArchive.php: T282844 (duration: 01m 07s)
  • 03:16 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/Revision/RevisionArchiveRecord.php: fix DeletedContributions breakage T282844 (duration: 01m 07s)
  • 03:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/logging/LogEventsList.php: fix PHP notice T282834 (duration: 01m 08s)
  • 00:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`

2021-05-13

  • 23:53 mutante: [sodium:~] $ sudo systemctl start update-ubuntu-mirror.service
  • 23:50 mutante: [sodium:~] $ sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 23:22 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikimediaEvents: Backport: Fix "final_state: vector" bug in VectorPrefDiffInstrumentation (T261842) (duration: 01m 07s)
  • 23:11 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WikiLove extension on tawiki (T280326) (duration: 01m 07s)
  • 23:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 9dc74e4: Revert "Enable media change tags on wikipedias" (T266067, T282822) (duration: 01m 07s)
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:39 dancy@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GeoData/includes/Hooks.php: Backport: Make sure mId exists (T282735) (duration: 01m 08s)
  • 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 80e5b9d: cd113a7: Enable structured_task/article/link_suggestion_interaction schema (T278177) (duration: 01m 06s)
  • 18:59 Urbanecm: Morning B&C is going to take few more minutes
  • 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2001.codfw.wmnet
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 0856ae1: ca52e78: GrowthExperiments backports (T282711, T282175) (duration: 01m 08s)
  • 18:26 mutante: people2001 is going down - people1003 (eqiad) and people2002 (codfw) are your replacements on bullseye
  • 18:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2001.codfw.wmnet
  • 18:22 Urbanecm: Start server-side upload for 2 video files (T282643, T282644)
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4cd6a78: Growth features: Push elwiki and cawiki out of dark mode (T280673; T280172) (duration: 01m 07s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04eb9d3: Enable media change tags on wikipedias (T266067) (duration: 01m 07s)
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3300c3: 59c8448: Enable Extension:MediaSearch on (test)commons (T265939) (duration: 01m 08s)
  • 17:20 andrew@deploy1002: Finished deploy [horizon/deploy@3d160f6]: Adding Database dashboards (duration: 04m 08s)
  • 17:16 andrew@deploy1002: Started deploy [horizon/deploy@3d160f6]: Adding Database dashboards
  • 16:36 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1005 back to config (T273278) (duration: 01m 07s)
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 16:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 16:24 effie: rebooting poolcounter1005
  • 16:09 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1005 will be rebooted for updates (T273278) (duration: 01m 07s)
  • 15:58 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1004 back to config (T273278) (duration: 01m 07s)
  • 15:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 15:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 15:46 effie: restarting poolcounter1004
  • 15:27 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1004 will be rebooted for updates (T273278) (duration: 01m 08s)
  • 14:49 Urbanecm: Start server-side upload for 1 video file (T282785)
  • 14:07 Urbanecm: Start server-side upload for 3 video files (T282558, T282556)
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070|]] [[gerrit:690071|]] [[gerrit:690072|]] [[gerrit:690073|]]) (T278116 T278117 T278114 T278177 T278487 T278112 T278111 T278118) (duration: 01m 09s)
  • 11:00 hnowlan: deleting packages still referenced by jessie components: `sudo -i reprepro clearvanished --delete`
  • 10:46 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:40 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:31 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:21 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:43 kevinbazira@deploy1002: Finished deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723 (duration: 32m 50s)
  • 07:10 kevinbazira@deploy1002: Started deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723
  • 05:54 _joe_: running docker image prune on contint1001, which has 722 unlinked images stored in its docker daemon
  • 01:20 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2021-05-12

  • 23:48 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 07s)
  • 23:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: ef41396: PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 08s)
  • 23:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:27 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 21:54 ryankemper: T280382 `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 20:57 ottomata: starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - T273789
  • 20:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:25 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:15 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:11 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:10 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:07 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:05 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 19:00 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin2001` tmux session `elastic_restarts`
  • 19:00 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 18:59 ryankemper: [Elastic] Restarted `*search*` services on `elastic2058`
  • 18:48 mutante: rsyncing home dirs of people1003 over to people2002 as well (T280989)
  • 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 3999be1: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s)
  • 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb65aff: Update wordmark and tagline for kawiki (T278251; 2/2) (duration: 01m 09s)
  • 18:26 urbanecm@deploy1002: Synchronized static/images/mobile/: eb65aff: Update wordmark and tagline for kawiki (T278251; 1/2) (duration: 01m 06s)
  • 18:25 urbanecm@deploy1002: sync-file aborted: eb65aff: Update wordmark and tagline for kawiki (T278251) (duration: 00m 00s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0cd3297: Disable Education Program namespaces in cswiki (T282691) (duration: 01m 15s)
  • 18:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: 7f14913: Modern keys must be unset (T282646) (duration: 01m 08s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11defd4: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 23s)
  • 16:15 hnowlan: including envoyproxy_1.15.5-1_amd64.changes with reprepro
  • 15:51 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet
  • 14:45 aborrero@cumin2001: START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet
  • 14:02 marostegui: Upgrad mysql on clouddb1015
  • 14:01 marostegui: Upgraded mysql on clouddb1014
  • 13:57 kormat: uploaded wmfmariadbpy 0.6.1 for bullseye
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json
  • 13:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:06 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:05 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json
  • 12:44 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:42 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json
  • 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 6cc2530: c268d08: b89592e: 7620953: 8fd7610: GrowthExperiments backports (duration: 01m 17s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json
  • 11:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9939edb: zhwikinews: Allow sysops to grant/revoke transwiki group (T273405) (duration: 02m 17s)
  • 10:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:32 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 10:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 10:01 effie: reboot poolcounter2003 and poolcounter2004
  • 09:55 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1074 from dbctl T281959', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json
  • 08:23 jbond42: rolling restart of ats
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 T282535', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 T282535', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json
  • 05:00 marostegui: Stop MySQL on labsdb1009 labsdb1010 labsdb1011 T282524 T282523 T282522
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T282535', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T282535', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json
  • 04:38 marostegui: Drop testing mailman3 databases T281548
  • 04:36 Amir1: importing archives of wikitech-l (T280322)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:35 mutante: people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run T280989
  • 01:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 08s)
  • 01:17 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 16s)
  • 00:54 mutante: made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore
  • 00:51 mutante: [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \;

2021-05-11

  • 23:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ec37795: Change namespace names and aliases on tiwiki and tiwiktionary (T263840) (duration: 01m 07s)
  • 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5bc40ac: ptwiki: Use celebration logos in new vector (T281925) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eac843a: Make DT source mode toolbar available as beta on all wikis (T279124) (duration: 01m 12s)
  • 23:06 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-pt-20.png: 60e6e4e: ptwiki: Add wikipedia-pt-20.png (T281925) (duration: 01m 08s)
  • 23:02 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: e35199b: Adding square logo and wordmark for ptwiki 20 years celebration (T281925) (duration: 01m 50s)
  • 22:14 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lists1002.wikimedia.org
  • 22:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts lists1002.wikimedia.org
  • 21:37 Urbanecm: Start server-side upload for 3 video files (T282566, T282565, T282559)
  • 21:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 21:34 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 20:52 legoktm: upgraded mailman3 on lists1001
  • 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2002.codfw.wmnet
  • 20:24 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 06m 57s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 00m 05s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 17m 01s)
  • 20:00 mforns@deploy1002: Started deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 19:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2002.codfw.wmnet
  • 19:46 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 09m 45s)
  • 19:37 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:33 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.5
  • 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 00m 07s)
  • 19:29 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:28 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 45m 45s)
  • 18:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:53 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on testwiki - T238138 (duration: 01m 09s)
  • 18:52 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 18:20 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.5 (duration: 09m 43s)
  • 18:10 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 17:36 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 01m 25s)
  • 17:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:35 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:34 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 02m 27s)
  • 17:33 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:32 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:31 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev (duration: 01m 59s)
  • 17:29 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev
  • 17:20 mutante: the backend for people.wikimedia.org switched from people1002 to people1003, the people.wikimedia.org CNAME has been updated. MOTD is about to be updated to inform users.
  • 17:18 legoktm: disabled pipermail redirects on lists.wikimedia.org
  • 17:07 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:12 jynus: restarting bacula-dir on backup1001, stuck process
  • 15:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 15:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog1001.eqiad.wmnet
  • 15:55 bstorm: restart haproxy on dbproxy1018/9 to remove old config
  • 15:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 15:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog2001.codfw.wmnet
  • 15:37 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:36 dancy@deploy1002: sync-world aborted: testwikis wikis to 1.37.0-wmf.4 (duration: 02m 04s)
  • 15:34 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 dancy@deploy1002: scap failed: RuntimeError scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details) (duration: 17m 36s)
  • 15:31 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:27 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog2001.codfw.wmnet
  • 15:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 15:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:57 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:49 moritzm: installing busybox security updates
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:27 moritzm: installing cgal security updates
  • 14:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 hashar: Restarted CI Jenkins with a snapshot of the Gearman Jenkins plugin # T281737
  • 14:10 hashar: Restarted CI Jenkins for plugin upgrade # T282433
  • 14:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:01 hashar: Restarted releases Jenkins for plugin upgrade # T282433
  • 13:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1d4d007: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 02s)
  • 13:39 jbond42: rolling restart of ats-backend
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1027.eqiad.wmnet
  • 12:11 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1027.eqiad.wmnet
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15913 and previous config saved to /var/cache/conftool/dbconfig/20210511-114540-root.json
  • 11:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15912 and previous config saved to /var/cache/conftool/dbconfig/20210511-113036-root.json
  • 11:16 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add P2671 and P4839 to deprecated properties list (T280779) (duration: 00m 58s)
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15911 and previous config saved to /var/cache/conftool/dbconfig/20210511-111532-root.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15910 and previous config saved to /var/cache/conftool/dbconfig/20210511-110029-root.json
  • 10:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:46 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P15909 and previous config saved to /var/cache/conftool/dbconfig/20210511-102303-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15908 and previous config saved to /var/cache/conftool/dbconfig/20210511-102212-root.json
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15907 and previous config saved to /var/cache/conftool/dbconfig/20210511-100708-root.json
  • 09:54 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw2002-dev.codfw.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15904 and previous config saved to /var/cache/conftool/dbconfig/20210511-095204-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15903 and previous config saved to /var/cache/conftool/dbconfig/20210511-093701-root.json
  • 09:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 08:37 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:36 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:34 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:32 moritzm: installing hivex security updates
  • 08:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15901 and previous config saved to /var/cache/conftool/dbconfig/20210511-082038-marostegui.json
  • 08:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:55 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15899 and previous config saved to /var/cache/conftool/dbconfig/20210511-070742-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15898 and previous config saved to /var/cache/conftool/dbconfig/20210511-065238-root.json
  • 06:50 marostegui: Stop replication on db2094:3318 T282514
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15897 and previous config saved to /var/cache/conftool/dbconfig/20210511-063734-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15896 and previous config saved to /var/cache/conftool/dbconfig/20210511-062231-root.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1082.eqiad.wmnet
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1082.eqiad.wmnet
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:11 marostegui: Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15894 and previous config saved to /var/cache/conftool/dbconfig/20210511-050816-marostegui.json

2021-05-10

  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53: Update messages used for tech CoC (T280886) (duration: 00m 56s)
  • 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: ba8b786: NO-OP: Enable ChessBrowser on beta (T244075) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: dd6fa65: Use ptwiki 20th anniversary logos (T281925) (duration: 00m 59s)
  • 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: f2a76b1: Add ptwiki 20th anniversary logos (T281925) (duration: 00m 58s)
  • 22:28 eileen: civicrm revision changed from 2052d79248 to 38ac15233f, config revision is 47f21e4568
  • 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 00m 56s)
  • 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 01m 01s)
  • 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
  • 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 T282348
  • 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
  • 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 02m 07s)
  • 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 55s)
  • 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 21s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 36s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 15s)
  • 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 04m 10s)
  • 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: loginwiki: Allow users to mark Notifications as read (T264834) (duration: 00m 57s)
  • 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Disable LocalisationUpdate, part I (T158360) (duration: 00m 58s)
  • 18:24 XioNoX: add cmooney to all network devices
  • 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Enable VE desktop section edit links (T280291) (duration: 00m 57s)
  • 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712) (duration: 00m 57s)
  • 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored (duration: 00m 57s)
  • 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
  • 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
  • 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565) (duration: 00m 59s)
  • 15:20 elukey: restart rsyslog on rpki1001
  • 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
  • 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
  • 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 11:46 Urbanecm: EU B&C window done
  • 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3418237: Disabling Education Program namespaces in Russian Wikipedia (T282112) (duration: 00m 57s)
  • 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8bef11c: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T282007) (duration: 00m 57s)
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # T262155
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # T262155
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 068cd7e: Change namespace name and aliases on jawikivoyage (T262155) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9209d96: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies (T281968) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7f6f849: Add tmpSerializeEmptyListsAsObjects to Wikibase.php (T241422) (duration: 01m 01s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6138c64: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T241422) (duration: 00m 57s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 23271dd: Enable ReferencePreviews as full default on Marathi wiki (T282147) (duration: 00m 57s)
  • 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (3/3; T281972) (duration: 00m 56s)
  • 11:08 urbanecm@deploy1002: sync-file aborted: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (T281972) (duration: 00m 04s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (2/3; T281972) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (1/3; T281972) (duration: 00m 57s)
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
  • 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:31 moritzm: installing openjdk-11 security updates
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
  • 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - T281673
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T281959', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
  • 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
  • 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - T281673
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
  • 08:24 XioNoX: push pfw policies - T282286
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
  • 07:38 hashar: Restarted CI Jenkins # T281737
  • 06:37 elukey: apt-get clean on rpki1001 to free some space
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl T281794', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json

2021-05-09

  • 21:44 legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
  • 18:28 legoktm: systemctl restart mailman3, bounce runner died again (T282348)
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
  • 06:21 legoktm: restarting mailman3 service, bounce runner died
  • 04:27 Amir1: starting upgrade of batch H of mailing lists (T280322)

2021-05-08

  • 17:18 Amir1: starting upgrade of batch G of mailing lists (T280322)

2021-05-07

  • 21:40 legoktm: deleted education@ from MM3, didn't import properly
  • 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
  • 21:33 legoktm: fixed owner for wdqs-gui-build list
  • 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
  • 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 18:23 brennen: 1.37.0-wmf.4 train status (T281145): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
  • 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: LinkBatch: skip bad input (T282180 T282070) (duration: 01m 06s)
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
  • 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
  • 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
  • 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
  • 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
  • 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
  • 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
  • 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
  • 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 13:04 Urbanecm: Start server-side upload for 1 video file (T281927)
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
  • 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
  • 09:55 dcausse: depooling wdqs1012 T280382, T282222
  • 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - T281673
  • 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - T281673
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
  • 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear T266486 T268392 T273360
  • 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)
  • 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T282093', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt

Archives

See Server Admin Log/Archives.